activities
latest
false
Document Understanding Activities
Last updated Oct 8, 2024

Classify Document Scope

UiPath.IntelligentOCR.Activities.DocumentClassification.ClassifyDocumentScope

Description

Provides a scope for classifier activities, providing all of the necessary files needed to perform document classification. Accepts at least one classifier, and brokers between them, ensuring all parameters are forwarded to the child classification activities.

Project compatibility

Windows-Legacy | Windows

Configuration

Properties panel

Common
  • DisplayName - The display name of the activity.
Input
  • DocumentObjectModel - The Document Object Model (DOM) you want to use to validate the document against. This model is stored in a Document variable and can be retrieved from the Digitize Document activity. Visit Digitize Document to learn how to use the activity. This field supports only Document variables.
  • DocumentPath - The path to the document you want to validate. This field supports only strings and String variables.
    Note: The supported file types for this property field are .png, .gif, .jpe, .jpg, .jpeg, .tiff, .tif, .bmp, and .pdf.
  • DocumentText - The text of the document itself, stored in a String variable. You can retrieve this value from the Digitize Document activity. Visit Digitize Document to learn how to use this activity. This field supports only strings and String variables.
  • Taxonomy - The Taxonomy against which the document is to be processed, stored in a DocumentTaxonomy variable. This field supports only DocumentTaxonomy variables.
Misc
  • Private - If selected, the values of variables and arguments are no longer logged at Verbose level.
Output
  • ClassificationResults - The results of running the classifier files on the specified file, stored in a IReadOnlyList<ClassificationResult> object. This field supports only IReadOnlyList<ClassificationResult> variables.

The ClassificationResult object contains the following information:

  • DocumentTypeId - The ID corresponding to the document type matched from the Taxonomy.
  • DocumentId - The file name of the processed document.
  • ContentType - The type of content contained in the processed document.
  • Confidence - Classification confidence, displayed as a numeric value between 0 and 1.
  • OcrConfidence - OCR confidence for the characters that are part of the reported reference, displayed as a numeric value between 0 and 1.
  • Reference - Evidencing for the classification, both in the text version of the document (through TextStartIndex and TextLength), and in the Document Object Model (through Tokens and the highlight boxes for each page from which the evidencing is selected).
  • DocumentBounds - Information on what part of the document the classification pertains to, with StartPage (Int32, 0-based), PageCount (Int32), TextStartIndex (Int32, 0-based), TextLength (Int32).
  • ClassifierName - Automatically populated by the Classify Document Scope activity with the display name of the classifier reporting the current ClassificationResult.

    Note: The ClassificationResults has all the content sorted in descending order by confidence score, which means the one at the top has the highest confidence.

Using the Configure Classifiers Wizard

The Configure Classifiers Wizard allows you to configure the way the classifiers are applied to each document type, and what results are acceptable.

Follow the steps below to configure the wizard:

  1. Add a Classify Document Scope activity to your workflow.
  2. Add one or more classifier activities inside the Classify Document Scope activity.
    1. Give your classifiers suggestive names.
    2. Order the classifiers within the scope, from left to right, in the order of acceptance priority.
    3. Configure your classifiers selecting Configure Classifiers.
      You can now see the Wizard
      Figure 1. Overview of the Configure Classifiers wizard

  3. Select the check boxes for the classifier and document type pairs you want to activate. Leaving a document type unchecked for a certain classifier can appear in one of the following scenarios:
    • The classifier is not trained or configured to identify that particular document type
    • The classifier does not perform as expected for that particular document type, and if such results are returned by the classifier, they should be ignored.
  4. If a classifier has its own taxonomy, then use the text boxes next to each check box to set the correct Taxonomy mapping between the two taxonomies. For example, if Classifier1 has been configured to return class INV for an invoice, but your project taxonomy contains a document type called "Incoming Invoice", then the box corresponding to "Incoming Invoice" and that particular Classifier1 should contain the string INV.
  5. Set a Minimum Confidence threshold, from 0 to 100, for each classifier in the Classify Document Scope. Any classification result with a confidence lower than this threshold will not be stored in the Classify Document Scope activity output.
    Tip: Most document types generate a prediction with a confidence level. Setting this property prevents false positives by only considering the predictions with a confidence level above the threshold. You can identify an optimal confidence level by testing various documents within your workflow, recording the results in an Excel spreadsheet, for example, and then analyze what threshold value is the most accurate. Apply the threshold by adjusting the Minimum Confidence property in your current scope.
  6. Select Save once all the classifiers are configured.
    Figure 2. The Configure Classifiers wizard configured to use a different classifier for each document type

Document Understanding Integration

The Classify Document Scope activity is part of the Document Understanding solutions. Visit the Document Understanding Guide for more information.

Was this page helpful?

Get The Help You Need
Learning RPA - Automation Courses
UiPath Community Forum
Uipath Logo White
Trust and Security
© 2005-2024 UiPath. All rights reserved.