The Taxonomy Manager can be used to create and edit a Taxonomy file specific to your current automation project. This Taxonomy file contains user-defined document types, sorted by Groups and Categories.
You can further use the Taxonomy file by converting it into a .NET data type with the Load Taxonomy activity, and then passing it as input for activities such as Classify Document Scope, Train Classifiers Scope, Train Extractors Scope and Present Validation Station.
The Taxonomy Manager can be accessed only after installing a
UiPath.IntelligentOCR.Activities package higher or equal to v1.6.0 as a dependency for your project. Once the package is installed, a Taxonomy Manager button appears on the Ribbon, in the Wizards section.
The Taxonomy Manager window lets you create document types, sorted by groups and categories. When opened for the first time in a project, no groups, categories, or document types are defined.
The first step is to create a group and a category for the document type you want to create. This is done by using the Add Group button next to the Any Group drop-down. Once you have chosen a name for your group, you can save it by using the Save button or by simply hitting Enter.
Once a group is defined, the Add Category button appears next to the Any Category drop-down. By using the same steps as above, you can create a category.
You can Edit the name of any group or category that you have created. This can be done by selecting one of the groups or categories and clicking on the Edit Group button.
Groups and categories can also be deleted. This can be done by clicking on the Remove Group button. A popup is displayed asking you to confirm the deleting action. Click Yes to approve the action.
Once the group and category are defined, we can move on to creating the Document Type. This can be done by clicking the Add New Document Type button. Doing this displays the Document Type Details tab, which enables you to choose a name, a group, a category, and a document type code, as well as add fields to the document type.
Clicking the New Field button displays the Edit Field tab, which lets you choose a name for the field, specify whether it is multi-value (Is Multi-Value) or if it allows for values with no evidence in the document to be processed (Requires Reference), and choose its type.
The available field types are:
- Date - Choosing this type also lets you specify an expected format, which is optional.
- Set - Choosing this type lets you add multiple values to the field.
- Table - Choosing this type lets you edit the structure of the table, as you can add columns and edit their name and type.
Once you have configured your field accordingly, clicking Save closes the Edit Field tab and adds the field in the Document Type Details tab. You can repeat this action multiple times, adding as many fields as you need.
Created fields can be edited by clicking them in the Document Type Details tab, and they can also be deleted and reordered by using the buttons that appear next to them when hovered.
Once the Document Type configuration is complete, clicking Save closes the Document Type Details tab and displays the newly created document type in the main tab.
Retracing these steps, you can create multiple groups, categories, and document types, which you can then sort by using the Search by name field.
Once a document type is saved, a Document Type ID is generated for it. Opening the document for editing displays the ID in the Document Type Details tab. The Document Type ID has a structure of the type
The changes you make in the Taxonomy Manager are automatically saved into the
taxonomy.jsonfile specific to your project. Once a Document Type is created, simply closing the wizard is enough to save your changes.
The Taxonomy is a collection of document types. A Document Type defines the metadata for what a logical document means: a definition of a type of file that must be handled by different business processes.
The Taxonomy organizes the defined Document Types into document Groups and Categories, for easier handling.
Document Types must have unique IDs, in the form
A Document Type is defined by its Name, Group and Category, and the fields associated with it.
The Document Type field must have a unique ID in the form
A Field is one piece of information that is expected to be found and captured from a specific Document Type. A Field may have derived parts: formatted information extracted or edited from the underlying textual value found in a document.
A Field must have a unique ID in the form
Table columns must have unique IDs in the form
All IDs must be alphanumeric strings containing letters and numbers only.
Address Line 1
Information that has strict reported values from a predefined set
A Set field must define the allowed options as values. These are reflected in the Validation Station.
A Boolean field can only have Yes or No as possible values, and is reflected in the Validation Station.
A Table field contains the definition of the columns.
Each cell in the table.
Table Columns in a Table field are defined as one of the regular fields in the Components list.
They cannot be Composed or Table types.
The Taxonomy also contains the list of groups and categories, as well as a collection of supported languages that can be associated with the processed documents. For example, to process documents in Japanese and English, then the Supported Languages tag must contain their respective display name and language code. An Undetermined Language (code
und) is recommended to be added, to support exceptional cases.
Called on a
DocumentTaxonomy object, the
Serialize() method returns a
JSON representation of the object, so that it can be stored and retrieved for later usage.
DocumentTaxonomy.Deserialize(jsonString) static extension returns a
DocumentTaxonomy object, hydrated with the JSON encoded data passed as a parameter.
Called on a
DocumentTaxonomy object, the GetFields()
method called with aDocumentTypeId` string returns a list of fields defined within that document type.
taxonomy.json file is generated in an automation project when you open the Taxonomy Manager for the first time. The file is always located in the root of the project, in the
DocumentProcessing folder. You can see the exact location of the file in the Taxonomy Manager, by hovering over the button. Alternatively, each time you open the Taxonomy Manager, a pop-up message will appear in the upper right corner, informing you of the location of the file.
taxonomy.json file is unique to each project, but it can be reused if you manually copy it over to a new project. To do so, you must simply create a new project, open the Taxonomy Manager so that a new file is created, then go to the project folder and replace the file with the taxonomy of your choice.
Updated about a month ago