- Overview
- Document Processing Contracts
- Release notes
- About the Document Processing Contracts
- Box Class
- IPersistedActivity interface
- PrettyBoxConverter Class
- IClassifierActivity Interface
- IClassifierCapabilitiesProvider Interface
- ClassifierDocumentType Class
- ClassifierResult Class
- ClassifierCodeActivity Class
- ClassifierNativeActivity Class
- ClassifierAsyncCodeActivity Class
- ClassifierDocumentTypeCapability Class
- ExtractorAsyncCodeActivity Class
- ExtractorCodeActivity Class
- ExtractorDocumentType Class
- ExtractorDocumentTypeCapabilities Class
- ExtractorFieldCapability Class
- ExtractorNativeActivity Class
- ExtractorResult Class
- ICapabilitiesProvider Interface
- IExtractorActivity Interface
- ExtractorPayload Class
- DocumentActionPriority Enum
- DocumentActionData Class
- DocumentActionStatus Enum
- DocumentActionType Enum
- DocumentClassificationActionData Class
- DocumentValidationActionData Class
- UserData Class
- Document Class
- DocumentSplittingResult Class
- DomExtensions Class
- Page Class
- PageSection Class
- Polygon Class
- PolygonConverter Class
- Metadata Class
- WordGroup Class
- Word Class
- ProcessingSource Enum
- ResultsTableCell Class
- ResultsTableValue Class
- ResultsTableColumnInfo Class
- ResultsTable Class
- Rotation Enum
- SectionType Enum
- WordGroupType Enum
- IDocumentTextProjection Interface
- ClassificationResult Class
- ExtractionResult Class
- ResultsDocument Class
- ResultsDocumentBounds Class
- ResultsDataPoint Class
- ResultsValue Class
- ResultsContentReference Class
- ResultsValueTokens Class
- ResultsDerivedField Class
- ResultsDataSource Enum
- ResultConstants Class
- SimpleFieldValue Class
- TableFieldValue Class
- DocumentGroup Class
- DocumentTaxonomy Class
- DocumentType Class
- Field Class
- FieldType Enum
- LanguageInfo Class
- MetadataEntry Class
- TextType Enum
- TypeField Class
- ITrackingActivity Interface
- ITrainableActivity Interface
- ITrainableClassifierActivity Interface
- ITrainableExtractorActivity Interface
- TrainableClassifierAsyncCodeActivity Class
- TrainableClassifierCodeActivity Class
- TrainableClassifierNativeActivity Class
- TrainableExtractorAsyncCodeActivity Class
- TrainableExtractorCodeActivity Class
- TrainableExtractorNativeActivity Class
- Document Understanding Digitizer
- Document Understanding ML
- Document Understanding OCR Local Server
- Document Understanding
- Release notes
- About the Document Understanding activity package
- Project compatibility
- Set PDF Password
- Merge PDFs
- Get PDF Page Count
- Extract PDF Text
- Extract PDF Images
- Extract PDF Page Range
- Extract Document Data
- Create Validation Task and Wait
- Wait for Validation Task and Resume
- Create Validation Task
- Classify Document
- Create Classification Validation Task
- Create Classification Validation Task and Wait
- Wait for Classification Validation Task and Resume
- Intelligent OCR
- Release notes
- About the IntelligentOCR activity package
- Project compatibility
- Configuring Authentication
- Load Taxonomy
- Digitize Document
- Classify Document Scope
- Keyword Based Classifier
- Document Understanding Project Classifier
- Intelligent Keyword Classifier
- Create Document Classification Action
- Wait For Document Classification Action And Resume
- Train Classifiers Scope
- Keyword Based Classifier Trainer
- Intelligent Keyword Classifier Trainer
- Data Extraction Scope
- Document Understanding Project Extractor
- RegEx Based Extractor
- Form Extractor
- Intelligent Form Extractor
- Present Validation Station
- Create Document Validation Action
- Wait For Document Validation Action And Resume
- Train Extractors Scope
- Export Extraction Results
- ML Services
- OCR
- OCR Contracts
- Release notes
- About the OCR Contracts
- Project compatibility
- IOCRActivity Interface
- OCRAsyncCodeActivity Class
- OCRCodeActivity Class
- OCRNativeActivity Class
- Character Class
- OCRResult Class
- Word Class
- FontStyles Enum
- OCRRotation Enum
- OCRCapabilities Class
- OCRScrapeBase Class
- OCRScrapeFactory Class
- ScrapeControlBase Class
- ScrapeEngineUsages Enum
- ScrapeEngineBase
- ScrapeEngineFactory Class
- ScrapeEngineProvider Class
- OmniPage
- PDF
- [Unlisted] Abbyy
- [Unlisted] Abbyy Embedded
Document Understanding Activities
Classify Document Scope
UiPath.IntelligentOCR.Activities.DocumentClassification.ClassifyDocumentScope
Provides a scope for classifier activities, providing all of the necessary files needed to perform document classification. Accepts at least one classifier, and brokers between them, ensuring all parameters are forwarded to the child classification activities.
Properties panel
Common
- DisplayName - The display name of the activity.
Input
- DocumentObjectModel - The
Document Object Model (DOM) you want to use to validate the document against.
This model is stored in a
Document
variable and can be retrieved from the Digitize Document activity. Visit Digitize Document to learn how to use the activity. This field supports onlyDocument
variables. - DocumentPath - The path to
the document you want to validate. This field supports only strings and
String
variables.Note: The supported file types for this property field are.png
,.gif
,.jpe
,.jpg
,.jpeg
,.tiff
,.tif
,.bmp
, and.pdf
. - DocumentText - The text of
the document itself, stored in a
String
variable. You can retrieve this value from the Digitize Document activity. Visit Digitize Document to learn how to use this activity. This field supports only strings andString
variables. - Taxonomy - The Taxonomy
against which the document is to be processed, stored in a
DocumentTaxonomy
variable. This field supports onlyDocumentTaxonomy
variables.
Misc
- Private - If selected, the values of variables and arguments are no longer logged at Verbose level.
Output
- ClassificationResults -
The results of running the classifier files on the specified file, stored in a
IReadOnlyList<ClassificationResult>
object. This field supports onlyIReadOnlyList<ClassificationResult>
variables.
The ClassificationResult object contains the following information:
- DocumentTypeId - The ID corresponding to the document type matched from the Taxonomy.
- DocumentId - The file name of the processed document.
- ContentType - The type of content contained in the processed document.
- Confidence -
Classification confidence, displayed as a numeric value between
0
and1
. - OcrConfidence - OCR
confidence for the characters that are part of the reported reference, displayed
as a numeric value between
0
and1
. - Reference - Evidencing for
the classification, both in the text version of the document (through
TextStartIndex
andTextLength
), and in the Document Object Model (throughTokens
and the highlight boxes for each page from which the evidencing is selected). - DocumentBounds -
Information on what part of the document the classification pertains to, with
StartPage
(Int32
, 0-based),PageCount
(Int32
),TextStartIndex
(Int32
, 0-based),TextLength
(Int32
). -
ClassifierName - Automatically populated by the Classify Document Scope activity with the display name of the classifier reporting the current ClassificationResult.
Note: TheClassificationResults
has all the content sorted in descending order by confidence score, which means the one at the top has the highest confidence.
The Configure Classifiers Wizard allows you to configure the way the classifiers are applied to each document type, and what results are acceptable.
Follow the steps below to configure the wizard:
- Add a Classify Document Scope activity to your workflow.
- Add one or more classifier
activities inside the Classify Document Scope activity.
- Give your classifiers suggestive names.
- Order the classifiers within the scope, from left to right, in the order of acceptance priority.
- Configure your classifiers selecting Configure Classifiers.
You can now see the WizardFigure 1. Overview of the Configure Classifiers wizard
- Select the check boxes for the
classifier and document type pairs you want to activate. Leaving a document type
unchecked for a certain classifier can appear in one of the following
scenarios:
- The classifier is not trained or configured to identify that particular document type
- The classifier does not perform as expected for that particular document type, and if such results are returned by the classifier, they should be ignored.
- If a classifier has its own
taxonomy, then use the text boxes next to each check box to set the correct
Taxonomy mapping between the two taxonomies. For example, if Classifier1
has been configured to return class
INV
for an invoice, but your project taxonomy contains a document type called "Incoming Invoice", then the box corresponding to "Incoming Invoice" and that particular Classifier1 should contain the stringINV
. - Set a Minimum Confidence
threshold, from 0 to 100, for each classifier in the Classify Document
Scope. Any classification result with a confidence lower than this
threshold will not be stored in the Classify Document Scope activity output.
Tip: Most document types generate a prediction with a confidence level. Setting this property prevents false positives by only considering the predictions with a confidence level above the threshold. You can identify an optimal confidence level by testing various documents within your workflow, recording the results in an Excel spreadsheet, for example, and then analyze what threshold value is the most accurate. Apply the threshold by adjusting the Minimum Confidence property in your current scope.
- Select Save once all the
classifiers are configured.
Figure 2. The Configure Classifiers wizard configured to use a different classifier for each document type