UiPath Activities

Taxonomy Manager

The Taxonomy Manager can be used to create and edit a Taxonomy file specific to your current automation project. This Taxonomy file contains user-defined document types, sorted by Groups and Categories.

You can further use the Taxonomy file by converting it into a .NET data type with the Load Taxonomy activity, and then passing it as input for activities such as Classify Document Scope, Train Classifiers And Extractors, and Present Validation Station.

The Taxonomy Manager can be accessed only after installing a UiPath.IntelligentOCR.Activities package higher or equal to v1.6.0 as a dependency for your project. Once the package is installed, a Taxonomy Manager button appears on the Ribbon, in the Wizards section.

How to Use the Taxonomy Manager

The Taxonomy Manager window lets you create document types, sorted by groups and categories. When opened for the first time in a project, no groups, categories, or document types are defined.

The first step is to create a group and a category for the document type you want to create. This is done by using the Add Group button next to the Any Group drop-down. Once you have chosen a name for your group, you can save it by using the Save button or by simply hitting Enter.

Once a group is defined, the Add Category button appears next to the Any Category drop-down. By using the same steps as above, you can create a category.

You can Edit the name of any group or category that you have created. This can be done by selecting one of the groups or categories and clicking on the Edit Group button.

Groups and categories can also be deleted. This can be done by clicking on the Remove Group button. A popup is displayed asking you to confirm the deleting action. Click Yes to approve the action.

Once the group and category are defined, we can move on to creating the Document Type. This can be done by clicking the Add New Document Type button. Doing this displays the Document Type Details tab, which enables you to choose a name, a group, a category, and a document type code, as well as add fields to the document type.

Clicking the New Field button displays the Edit Field tab, which lets you choose a name for the field, specify whether it is multi-value, and choose its type. The available field types are:

  • Text
  • Number
  • Date - Choosing this type also lets you specify an expected format, which is optional.
  • Name
  • Address
  • Keyword
  • Set - Choosing this type lets you add multiple values to the field.
  • Boolean
  • Table - Choosing this type lets you edit the structure of the table, as you can add columns and edit their name and type.

Once you have configured your field accordingly, clicking Save closes the Edit Field tab and adds the field in the Document Type Details tab. You can repeat this action multiple times, adding as many fields as you need.

Created fields can be edited by clicking them in the Document Type Details tab, and they can also be deleted and reordered by using the buttons that appear next to them when hovered.

Once the Document Type configuration is complete, clicking Save closes the Document Type Details tab and displays the newly created document type in the main tab.

Retracing these steps, you can create multiple groups, categories, and document types, which you can then sort by using the Search by name field.

Once a document type is saved, a Document Type ID is generated for it. Opening the document for editing displays the ID in the Document Type Details tab. The Document Type ID has a structure of the type Group.Category.Document.

Note:

The changes you make in the Taxonomy Manager are automatically saved into the taxonomy.json file specific to your project. Once a Document Type is created, simply closing the wizard is enough to save your changes.

Taxonomy

The Taxonomy is a collection of document types. A Document Type defines the metadata for what a logical document means: a definition of a type of file that must be handled by different business processes.

The Taxonomy organizes the defined Document Types into document Groups and Categories, for easier handling.

Document Types must have unique IDs, in the form <groupID>.<categoryID>.<docTypeID>.

A Document Type is defined by its Name, Group and Category, and the fields associated with it.
The Document Type field must have a unique ID in the form <groupID>.<categoryID>.<docTypeID>.DocumentType.

A Field is one piece of information that is expected to be found and captured from a specific Document Type. A Field may have derived parts: formatted information extracted or edited from the underlying textual value found in a document.
A Field must have a unique ID in the form <groupID>.<categoryID>.<docTypeID>.<fieldID>.
Table columns must have unique IDs in the form <groupID>.<categoryID>.<docTypeID>.<fieldID>.Body.<columnID>.

All IDs must be alphanumeric strings containing letters and numbers only.

Field Type
Allows Multi-Value
Purpose
Derived Parts
Additional Information

Text

Yes

Textual information

N/A

N/A

Number

Yes

Numeric values

  • Value

N/A

Date

Yes

Dates

  • Day
  • Month
  • Year

N/A

Name

Yes

Person names

  • Given Name
  • Middle Name
  • Last Name

N/A

Address

Yes

Addresses

  • Address Line 1
  • Address Line 2
  • Address Line 3
  • City
  • State / County / Province
  • Country
  • Zip Postal Code

N/A

Set

Yes

Information that has strict reported values from a predefined set

N/A

A Set field must define the allowed options as values. These are reflected in the Validation Station.

Boolean

Yes

True/False values

N/A

A Boolean field can only have Yes or No as possible values, and is reflected in the Validation Station.

Table

No

Structured data

N/A

A Table field contains the definition of the columns.

Table Column

No

Each cell in the table.

N/A

Table Columns in a Table field are defined as one of the regular fields in the Components list.

They cannot be Composed or Table types.

The Taxonomy also contains the list of groups and categories, as well as a collection of supported languages that can be associated with the processed documents. For example, to process documents in Japanese and English, then the Supported Languages tag must contain their respective display name and language code. An Undetermined Language (code und) is recommended to be added, to support exceptional cases.

Below you can find a sample taxonomy document, serialized as JSON, which contains one document type and different types of fields:


{
	"DocumentTypes": [{
			"DocumentTypeId": "TestGroup.TestCategory.TestDocumentTypeWithAllFields",
			"Group": "Test Group",
			"Category": "Test Category",
			"Name": "Test Document Type With All Fields",
			"OptionalUniqueIdentifier": "TST",
			"TypeField": {
				"FieldId": "TestGroup.TestCategory.TestDocumentTypeWithAllFields.DocumentType",
				"FieldName": "Document Type"
			},
			"Fields": [{
					"FieldId": "TestGroup.TestCategory.TestDocumentTypeWithAllFields.SingleValueText",
					"FieldName": "Single Value Text",
					"IsMultiValue": false,
					"Type": 0,
					"DeriveFieldsFormat": "",
					"Components": [],
					"SetValues": []
				}, {
					"FieldId": "TestGroup.TestCategory.TestDocumentTypeWithAllFields.Multi-ValueText",
					"FieldName": "Multi-Value Text",
					"IsMultiValue": true,
					"Type": 0,
					"DeriveFieldsFormat": "",
					"Components": [],
					"SetValues": []
				}, {
					"FieldId": "TestGroup.TestCategory.TestDocumentTypeWithAllFields.SingleValueDate",
					"FieldName": "Single Value Date",
					"IsMultiValue": false,
					"Type": 2,
					"DeriveFieldsFormat": "dd/MM/yyyy",
					"Components": [],
					"SetValues": []
				}, {
					"FieldId": "TestGroup.TestCategory.TestDocumentTypeWithAllFields.Multi-ValueNumber",
					"FieldName": "Multi-Value Number",
					"IsMultiValue": true,
					"Type": 1,
					"DeriveFieldsFormat": "",
					"Components": [],
					"SetValues": []
				}, {
					"FieldId": "TestGroup.TestCategory.TestDocumentTypeWithAllFields.AName",
					"FieldName": "A Name",
					"IsMultiValue": false,
					"Type": 3,
					"DeriveFieldsFormat": "",
					"Components": [],
					"SetValues": []
				}, {
					"FieldId": "TestGroup.TestCategory.TestDocumentTypeWithAllFields.AnAddress",
					"FieldName": "An Address",
					"IsMultiValue": false,
					"Type": 4,
					"DeriveFieldsFormat": "",
					"Components": [],
					"SetValues": []
				}, {
					"FieldId": "TestGroup.TestCategory.TestDocumentTypeWithAllFields.ASetWithThreeOptions",
					"FieldName": "A Set With Three Options",
					"IsMultiValue": false,
					"Type": 6,
					"DeriveFieldsFormat": "",
					"Components": [],
					"SetValues": ["Option 1", "Option 2", "Option 3"]
				}, {
					"FieldId": "TestGroup.TestCategory.TestDocumentTypeWithAllFields.ABooleanYesNo",
					"FieldName": "A Boolean YesNo",
					"IsMultiValue": false,
					"Type": 7,
					"DeriveFieldsFormat": "",
					"Components": [],
					"SetValues": ["Yes", "No"]
				}, {
					"FieldId": "TestGroup.TestCategory.TestDocumentTypeWithAllFields.ATable",
					"FieldName": "A Table",
					"IsMultiValue": false,
					"Type": 9,
					"DeriveFieldsFormat": "",
					"Components": [{
							"FieldId": "TestGroup.TestCategory.TestDocumentTypeWithAllFields.ATable.Body.Column1Text",
							"FieldName": "Column 1 Text",
							"IsMultiValue": false,
							"Type": 0,
							"DeriveFieldsFormat": "",
							"Components": [],
							"SetValues": []
						}, {
							"FieldId": "TestGroup.TestCategory.TestDocumentTypeWithAllFields.ATable.Body.Column2Date",
							"FieldName": "Column 2 Date",
							"IsMultiValue": false,
							"Type": 2,
							"DeriveFieldsFormat": "",
							"Components": [],
							"SetValues": []
						}, {
							"FieldId": "TestGroup.TestCategory.TestDocumentTypeWithAllFields.ATable.Body.Column3Number",
							"FieldName": "Column 3 Number",
							"IsMultiValue": false,
							"Type": 1,
							"DeriveFieldsFormat": "",
							"Components": [],
							"SetValues": []
						}, {
							"FieldId": "TestGroup.TestCategory.TestDocumentTypeWithAllFields.ATable.Body.Column4Name",
							"FieldName": "Column 4 Name",
							"IsMultiValue": false,
							"Type": 3,
							"DeriveFieldsFormat": "",
							"Components": [],
							"SetValues": []
						}, {
							"FieldId": "TestGroup.TestCategory.TestDocumentTypeWithAllFields.ATable.Body.Column5Address",
							"FieldName": "Column 5 Address",
							"IsMultiValue": false,
							"Type": 4,
							"DeriveFieldsFormat": "",
							"Components": [],
							"SetValues": []
						}
					],
					"SetValues": []
				}
			]			
		}
	],
	"Groups": [{
			"Name": "CustomGroup",
			"Categories": ["CustomCategory"]
		}, {
			"Name": "Test Group", 
			"Categories": ["Test Category"]	
		}
	],
	"SupportedLanguages": [{
			"Name": "English",
			"Code": "eng"
		}, {
			"Name": "Japanese",
			"Code": "jpn"
		}, {
			"Name": "Undetermined Language",
			"Code": "und"
		}
	]
}

Taxonomy Extension Methods

Serialize()

Called on a DocumentTaxonomy object, the Serialize() method returns a JSON representation of the object, so that it can be stored and retrieved for later usage.

Deserialize(String)

The DocumentTaxonomy.Deserialize(jsonString) static extension returns a DocumentTaxonomy object, hydrated with the JSON encoded data passed as a parameter.

GetFields(String)

Called on a DocumentTaxonomy object, theGetFields()method called with aDocumentTypeId` string returns a list of fields defined within that document type.

The taxonomy.json File

The taxonomy.json file is generated in an automation project when you open the Taxonomy Manager for the first time. The file is always located in the root of the project, in the DocumentProcessing folder. You can see the exact location of the file in the Taxonomy Manager, by hovering over the info button. Alternatively, each time you open the Taxonomy Manager, a pop-up message will appear in the upper right corner, informing you of the location of the file.

The taxonomy.json file is unique to each project, but it can be reused if you manually copy it over to a new project. To do so, you must simply create a new project, open the Taxonomy Manager so that a new file is created, then go to the project folder and replace the file with the taxonomy of your choice.

Updated 4 months ago


Taxonomy Manager


Suggested Edits are limited on API Reference Pages

You can only suggest edits to Markdown body content, but not to the API spec.