- Overview
- Document Processing Contracts
- Release notes
- About the Document Processing Contracts
- Box Class
- IPersistedActivity interface
- PrettyBoxConverter Class
- IClassifierActivity Interface
- IClassifierCapabilitiesProvider Interface
- ClassifierDocumentType Class
- ClassifierResult Class
- ClassifierCodeActivity Class
- ClassifierNativeActivity Class
- ClassifierAsyncCodeActivity Class
- ClassifierDocumentTypeCapability Class
- ExtractorAsyncCodeActivity Class
- ExtractorCodeActivity Class
- ExtractorDocumentType Class
- ExtractorDocumentTypeCapabilities Class
- ExtractorFieldCapability Class
- ExtractorNativeActivity Class
- ExtractorResult Class
- ICapabilitiesProvider Interface
- IExtractorActivity Interface
- ExtractorPayload Class
- DocumentActionPriority Enum
- DocumentActionData Class
- DocumentActionStatus Enum
- DocumentActionType Enum
- DocumentClassificationActionData Class
- DocumentValidationActionData Class
- UserData Class
- Document Class
- DocumentSplittingResult Class
- DomExtensions Class
- Page Class
- PageSection Class
- Polygon Class
- PolygonConverter Class
- Metadata Class
- WordGroup Class
- Word Class
- ProcessingSource Enum
- ResultsTableCell Class
- ResultsTableValue Class
- ResultsTableColumnInfo Class
- ResultsTable Class
- Rotation Enum
- SectionType Enum
- WordGroupType Enum
- IDocumentTextProjection Interface
- ClassificationResult Class
- ExtractionResult Class
- ResultsDocument Class
- ResultsDocumentBounds Class
- ResultsDataPoint Class
- ResultsValue Class
- ResultsContentReference Class
- ResultsValueTokens Class
- ResultsDerivedField Class
- ResultsDataSource Enum
- ResultConstants Class
- SimpleFieldValue Class
- TableFieldValue Class
- DocumentGroup Class
- DocumentTaxonomy Class
- DocumentType Class
- Field Class
- FieldType Enum
- LanguageInfo Class
- MetadataEntry Class
- TextType Enum
- TypeField Class
- ITrackingActivity Interface
- ITrainableActivity Interface
- ITrainableClassifierActivity Interface
- ITrainableExtractorActivity Interface
- TrainableClassifierAsyncCodeActivity Class
- TrainableClassifierCodeActivity Class
- TrainableClassifierNativeActivity Class
- TrainableExtractorAsyncCodeActivity Class
- TrainableExtractorCodeActivity Class
- TrainableExtractorNativeActivity Class
- Document Understanding Digitizer
- Document Understanding ML
- Document Understanding OCR Local Server
- Document Understanding
- Release notes
- About the Document Understanding activity package
- Project compatibility
- Set PDF Password
- Merge PDFs
- Get PDF Page Count
- Extract PDF Text
- Extract PDF Images
- Extract PDF Page Range
- Extract Document Data
- Create Validation Task and Wait
- Wait for Validation Task and Resume
- Create Validation Task
- Classify Document
- Create Classification Validation Task
- Create Classification Validation Task and Wait
- Wait for Classification Validation Task and Resume
- Intelligent OCR
- Release notes
- About the IntelligentOCR activity package
- Project compatibility
- Configuring Authentication
- Load Taxonomy
- Digitize Document
- Classify Document Scope
- Keyword Based Classifier
- Document Understanding Project Classifier
- Intelligent Keyword Classifier
- Create Document Classification Action
- Wait For Document Classification Action And Resume
- Train Classifiers Scope
- Keyword Based Classifier Trainer
- Intelligent Keyword Classifier Trainer
- Data Extraction Scope
- Document Understanding Project Extractor
- RegEx Based Extractor
- Form Extractor
- Intelligent Form Extractor
- Present Validation Station
- Create Document Validation Action
- Wait For Document Validation Action And Resume
- Train Extractors Scope
- Export Extraction Results
- ML Services
- OCR
- OCR Contracts
- Release notes
- About the OCR Contracts
- Project compatibility
- IOCRActivity Interface
- OCRAsyncCodeActivity Class
- OCRCodeActivity Class
- OCRNativeActivity Class
- Character Class
- OCRResult Class
- Word Class
- FontStyles Enum
- OCRRotation Enum
- OCRCapabilities Class
- OCRScrapeBase Class
- OCRScrapeFactory Class
- ScrapeControlBase Class
- ScrapeEngineUsages Enum
- ScrapeEngineBase
- ScrapeEngineFactory Class
- ScrapeEngineProvider Class
- OmniPage
- PDF
- [Unlisted] Abbyy
- [Unlisted] Abbyy Embedded
Document Understanding Activities
Load Taxonomy
UiPath.IntelligentOCR.Activities.TaxonomyManagement.LoadTaxonomy
taxonomy.json
file created with the help of the Taxonomy Manager into a variable that can
be further used with other activities.
Common
- DisplayName - The display name of the activity.
Misc
- Private - If selected, the values of variables and arguments are no longer logged at Verbose level.
Output
-
Taxonomy - The taxonomy you want to load, stored in a
DocumentTaxonomy
variable. This output can be later used in activities that receive a taxonomy as input.Important: In case you use an Intel Xe GPU and Taxonomy Manager is not displayed properly, we recommend updating the graphics driver to the latest version. For more information, please visit this page.
Serialize()
: Called on aDocumentTaxonomy
object, theSerialize()
method returns aJSON
representation of the object, so that it can be stored and retrieved for later usage.Deserialize(String)
: TheDocumentTaxonomy.Deserialize(jsonString)
static extension returns aDocumentTaxonomy
object, hydrated with the JSON encoded data passed as a parameter.GetFields(String)
: Called on aDocumentTaxonomy
object, theGetFields()
method called with aDocumentTypeId
string returns a list of fields defined within that document type.
The Taxonomy Manager can be used to create and edit a Taxonomy file specific to your current automation project. This Taxonomy file contains user-defined document types, organized in Groups and Categories.
You can further use the Taxonomy file by converting it into a .NET data type with the Load Taxonomy activity, and then passing it as input for activities such as:
The Taxonomy Manager can be accessed only after installing a UiPath.IntelligentOCR.Activities package higher or equal to v1.6.0 as a dependency for your project. Once the package is installed, a Taxonomy Manager button appears on the Ribbon, in the Wizards section.
The Taxonomy Manager window lets you create document types, organized by groups and categories. When opened for the first time in a project, no groups, categories, or document types are defined.
The first step is to create a group or a document type. The difference between the two of them is that a group involves a hierarchical structure while a document type can be created as a single file. A complex project implies creating groups, categories, and document types, while a simple project can only require one or two document types.
Creating a Group
When a group is created, a category is also necessary for the document type you want to create inside the group. This is done by using the Group button. Once you have chosen a name for your group, you can save it by using the Save button or by using the Enter key.
Creating a Category
Once a group is defined and selected, you can create a Category and/or a Document Type within the group, by using their defined buttons. Select Save or use the Enter key to save the configuration.
Creating a Document Type
A Document Type can be created either by part of a group or as a single document. When created inside a group, make sure that the group is selected, then select Document Type.
If the Document Type is created as a single file, then make sure that no group is selected and select Document Type. After selecting Document Type, input a name for the file and select Save.
Selecting an already created Document Type lets you change its name, copy its unique ID to clipboard, reassign it to another group or category, or none of them. You can also input a code for the document type.
Group.Category.Document
and can be copied to the clipboard. The
Document Type ID code is an optional functionality and it can be used to
find your documents or map your documents to the Document Types that you define in
the taxonomy.
Creating Fields
When the Document Type is selected, the Field button becomes available for you to create a new field. Once the Field button is selected, you can enter a name for it and select its type from the dropdown list.
The Field category has two tabs: Details and Rules. The Details tab provides information about the selected field, such as Field Name, assigned hotkey, or field type, while the Rules tab allows you to create rules that need to be fulfilled by the extraction result for the field.
The available field types are the following:
- Text
- Number
- Date - Choosing this type
also lets you specify an expected format, which is optional.
Note:
If you want to add an expected format, use a format compliant to MSDN (MicroSoft Developer Network).
This format may be used by extractors and is used by the Data Extraction Scope activity when trying to parse a Date into its constituent Day, Month, and Year parts.
- Name
- Address
- Set - Choosing this type lets you add multiple values to the field from a pre-established list.
- Boolean
- Table - Choosing this type lets you edit the structure of the table, as you can add columns and edit their name and type.
Details Tab
After the new Field is created, select it to view more information. By default, when you open a field, the Details tab is displayed allowing you to modify the name, color, or hotkey of the field. You can also specify whether it is multi-value (Is multi-value) or if it allows for values with no evidence in the document to be processed (Requires reference). The multi-value option allows for a field to have multiple values, without being restricted to a specific list.
- Is Multi-Value: If a field is set as multi-value, you can have more than one value reported for that particular field. For example, you want to extract a "List of Directors" where you can have variable number of entries. Imagine a multi-value field like a single-column table.
- Requires reference: When a field requires reference, you can add a value to it only if you select something from the document you see in Validation Station. For special fields in which you want to capture values that are maybe not visible in the document, you can switch Requires Reference to Off. Effect will be that user can add a value without specifying a place in the document where that values comes from.
You can also select the Type of the field from the dropdown list, or add a Default value. Use the Default value field to define values to be populated in the Extraction Result, in case there is no value for the field identified in the document.
- Read-only: If enabled, the human validator can view any validator notes set on the ExtractionResult field in Validation Station, as a message. If disabled (default state), the human validator can also edit that note in Validation Station and thus communicate back to the robot information about the decision taken.
- Text: If Text is selected, the validator note is displayed as a text message (or editable text when editing is enabled) in Validation Station. The human validator can view, edit, or add a maximum of 200 characters message in Validation Station.
- Options: If you select Options, you can configure a series of radio buttons that the human validator can view and, if not read-only, select in Validation Station. You can add a maximum of 10 options.
GetFieldValidatorNotes(<fieldId>)
and
SetFieldValidatorNotes(<fieldId>,
<validatorNote>)
.
Created fields can be deleted by using the delete button that appears next to them or reordered by using the drag and drop function.
A field can also be deleted from the Details window, by selecting Delete.
Retracing these steps, you can create multiple groups, categories, and document types, which you can then filter by using the Search field.
Rules Tab
Field rules help you optimize the extraction results and to automatically validate them when the running your workflow. Their role is to increase the extraction efficiency and to help you easily validate the fields that need attention in Validation Station, by highlighting them. You can create multiple rules that apply to one field.
You define a rule by setting the Evaluator type and the Criticality level.
Evaluator Type
Use the Evaluator Type to specify how the defined rules should be evaluated. There are two evaluator types that you can choose from: AND, OR.
Evaluator Type |
Description |
Example |
---|---|---|
AND |
Use this evaluator type when all the rules need to be executed. |
Rule : Invoice Number starts with A And ends with X.
|
OR |
Only one of the rules needs to be executed. |
Rule : Invoice number starts with A OR is 123.
|
Criticality Level
Indicates the criticality of all rules defined for a field. You cannot set a MUST level if the rule is broken in the Validation Station session. There are two criticality levels that you can choose from: MUST, SHOULD.
The created rule is triggered once the set criticality level is identified.
When you want to submit a rule, a MUST rule requires 100% success, otherwise the Submit operation fails. A SHOULD rule allows you to Submit it even if the rule is broken.
You can always check if a rule is broken by using the helper method from the ExtractionResult Class class, that resides in the UiPath.DocumentProcessing.Contracts activity package.
Description | |
---|---|
MUST |
Use this criticality level when the created rule is imperative to be included in the Extraction Result process. |
SHOULD |
Use this criticality level when the created rule is optional. |
Once you have selected the Evaluator Type and the Criticality Level, you must set a type applicable for your newly created rule. There are several options you can choose from. Here's a complete list with all the available ones:
- Is not empty
- Possible values
- Starts with
- Ends with
- Contains
- Fixed length
- Is email
- RegExNote: The field type for which you created a rule dictates the number of possible rule types from which you can select one. For example, a field of type Text displays all the possible rule types, while a field of type Date displays only two rule types, Is not empty and Possible values.
Type of Rules
Description | Field type | Criticality level | Evaluator Type | |
---|---|---|---|---|
Is not empty | The extracted value cannot be empty, meaning that the field is mandatory. If the value is missing, it requires validation/manual input. | Applicable to fields of the following types:
|
MUST SHOULD | AND
OR |
Possible values | User defines all possible values and the extracted data is one of the values added as input while creating the rule (for example, Employer Type is either "full-time", "part-time", or "internship"). | Applicable to fields of the following types:
|
MUST SHOULD | AND
OR |
Expression | Define mathematical expressions that act as a rule for extracting data. | Applicable to fields of type number.
A
condition is required while configuring the rule. Choose one
of the following options:
A mathematical expression is required. Use the
predefined operators to define your expression. Check the
following examples:
|
MUST SHOULD | AND
OR |
Starts with | This is a fixed rule meaning that the extracted value needs to start with one of the values added by the user. | Applicable to fields of the following types:
|
MUST SHOULD | AND
OR |
Ends with | This is a fixed rule meaning that the extracted value needs to end with one of the values added by the user. | Applicable to fields of the following types:
|
MUST SHOULD | AND
OR |
Contains | This is a fixed rule meaning that the extracted value needs to contain one of the values added by the user. | Applicable to fields of the following types:
|
MUST SHOULD | AND
OR |
Fixed length | This is a fixed rule meaning that the extracted value needs to have a certain fixed length. | Applicable to fields of the following types:
|
MUST SHOULD | AND
OR |
Is email | This is a fixed rule meaning that the extracted value needs to be written in an email format. | Applicable to fields of the following types:
|
MUST SHOULD | AND
OR |
RegEx | This is a fixed rule meaning that the extracted value needs to contain a regular expression similar to one of the values added by the user. | Applicable to fields of the following types:
|
MUST SHOULD | AND
OR |
Using Rules
- Select a field in the Taxonomy panel.
- Go to the Rules tab.
- Select Add new, to add a new rule.
- Type in the
full address
rule, for which the Type is Contains, and the Expression isst, str, street
. - Choose the Evaluator type.
In this example, select OR.
- Choose the Criticality level.
In this example, select MUST.
- Select Add new.
- Type in the
phone number
rule, for which the Type is Is not empty. - Select Add new.
- Type in the
city or state
rule, for which the Type is Contains, and the Expression iscity, state
.
The following animated image shows the steps previously described.
Other Options
Editing
You can Edit the name of any group, category, or document type that you have created. This can be done by selecting one of the three levels of configuration and editing the Name field.
Deleting
Groups, categories, and document types can also be deleted. There are two available options:
- Select Delete , at the parent element of the object you want to delete.
- Select Delete while selecting the object you want to delete.
In both cases, a pop-up is displayed asking you to confirm the deleting action. Select Delete to approve the action.
Customization and Accessibility
A hotkey and color are automatically allocated to the newly created field. You can use them for better visibility and faster navigation through your taxonomy. Customize them by clicking on the hotkey or the color code field.
A customized field with color and hotkey can instruct the Validation Station and the Template Manager to use the assigned color when displaying the field and to use the assigned hotkey as a shortcut for providing values to fields. Visit Validation Station for more information about how to use the field shortcuts to assign values to a field.
To assign a color and a hotkey for a field, select the field, and choose a certain color code in the Color field, and then select a specific hotkey from the Hotkey menu.
Navigate through the Taxonomy Manager by using the keyboard shortcuts. Select Show available keyboard shortcuts and activate the Toggle keyboard shortcuts option to avoid the accidental triggering of the keyboard shortcuts. Nodes can also be collapsed.
taxonomy.json
file specific to your project.