Activities
latest
false
Banner background image
Document Understanding Activities
Last updated 2024年3月28日

Classify Document Scope

UiPath.IntelligentOCR.Activities.DocumentClassification.ClassifyDocumentScope

Provides a scope for classifier activities, providing all of the necessary files needed to perform document classification. Accepts at least one classifier, and brokers between them, ensuring all parameters are forwarded to the child classification activities.

Properties

Common
  • DisplayName - The display name of the activity.
Input
  • DocumentObjectModel - The Document Object Model you want to use to validate the document against. This model is stored in a Document variable and can be retrieved from the Digitize Document activity. Please see the documentation of the activity for more information on how to do this. This field supports only Document variables.
  • DocumentPath - The path to the document you want to validate. This field supports only strings and String variables.

    Note: The supported file types for this property field are .png, .gif, .jpe, .jpg, .jpeg, .tiff, .tif, .bmp, and .pdf.
  • DocumentText - The text of the document itself, stored in a String variable. This value can be retrieved from the Digitize Document activity. Please see the documentation of the activity for more information on how to do this. This field supports only strings and String variables.
  • Taxonomy - The Taxonomy against which the document is to be processed, stored in a DocumentTaxonomy variable. This field supports only DocumentTaxonomy variables.
Misc
  • Private - If selected, the values of variables and arguments are no longer logged at Verbose level.
Output
  • ClassificationResults - The results of running the classifier files on the specified file, stored in a IReadOnlyList<ClassificationResult> object. This field supports only IReadOnlyList<ClassificationResult> variables.

The ClassificationResult object contains:

  • DocumentTypeId - The ID corresponding to the document type matched from the Taxonomy.
  • DocumentId - The file name of the processed document.
  • ContentType - The type of content contained in the processed document.
  • Confidence - Classification confidence, displayed as a numeric value between 0 and 1.
  • OcrConfidence - OCR confidence for the characters that are part of the reported reference, displayed as a numeric value between 0 and 1.
  • Reference - Evidencing for the classification, both in the text version of the document (through TextStartIndex and TextLength), and in the Document Object Model (through Tokens and the highlight boxes for each page from which the evidencing is selected).
  • DocumentBounds - Information on what part of the document the classification pertains to, with StartPage (Int32, 0-based), PageCount (Int32), TextStartIndex (Int32, 0-based), TextLength (Int32).
  • ClassifierName - Automatically populated by the Classify Document Scope activity with the display name of the classifier reporting the current ClassificationResult.

    Note: The ClassificationResults has all the content sorted in descending order by confidence score, which means the one at the top has the highest confidence.

Using the Configure Classifiers Wizard

The Configure Classifiers Wizard allows you to configure the way the classifiers are applied to each document type, and what results are acceptable.

Follow the steps below to configure the wizard:

  1. Add a Classify Document Scope activity to your workflow.
  2. Place one or more Classifier activities inside the Classify Document Scope activity.

    • Give your Classifiers suggestive names.
    • Order the Classifiers within the scope, from left to right, in the order of acceptance priority.
    • Configure your classifiers by clicking on the Configure Classifiers button.
    • You can now see the Wizard.



  3. Select the checkboxes for the classifier and document type pairs you want to activate. Leaving a document type unchecked for a certain classifier means that (1) the classifier is not trained or configured to identify that particular document type; or (2) the classifier does not perform as expected for that particular document type, and if such results are returned by the classifier, they should be ignored.
  4. If a classifier has its own taxonomy, then use the text boxes next to each checkbox to set the correct Taxonomy mapping between the two taxonomies. For example, if Classifier1 has been configured to return class INV for an Invoice, but your project taxonomy contains a document type called "Incoming Invoice", then the box corresponding to "Incoming Invoice" and that particular Classifier1 should contain the string INV.
  5. Select a minimum confidence threshold for each of your classifiers. Acceptable values are between 0 (no minimum confidence) and 100. If a classifier returns a classification result with a confidence lower than the set threshold, the Classify Document Scope will ignore that classification result and not report it.
  6. Click on the Save button once all the classifiers are configured.



Document Understanding Integration

The Classify Document Scope activity is part of the Document Understanding Solutions. Visit the Document Understanding Guide for more information.

  • Properties
  • Using the Configure Classifiers Wizard

Was this page helpful?

Get The Help You Need
Learning RPA - Automation Courses
UiPath Community Forum
Uipath Logo White
Trust and Security
© 2005-2024 UiPath. All rights reserved.