Document Understanding User Guide

DELIVERY:

Last updated Apr 4, 2025

Create and Configure Fields

Fields can be renamed. Just click the Edit field button and simply edit the name of the field at the top of the window.

If there are fields that you later decide you do not want to use for training an ML model, you can either delete them or you can always hide them using the Hidden checkbox in the Edit field window.

Note: A maximum of 300 fields can be created.

Column Fields

A line item Description or Unit Price on an invoice document would be examples of Column fields.

Create a New Column Field

Click in the table section at the top of the page to add a new Column field. The Create Column Field window is displayed.
Fill in a unique name for the field in the Enter unique field name field. The field does not accept uppercase letters. It can only contain lowercase letters, numbers, underscore _ and dash -.
Click OK. The Edit Field window is displayed with the General tab open.
From the Content Type drop-down, select the content type.
Click the Hotkey field and press a key on your keyboard to automatically populate it.
Select the Split items checkbox if you want this field to be used as a delimiter between line items or rows in a table. Any line on which this field appears is considered to be a new line item or row in the table. Most commonly this is used on Line Amount fields on Invoice line items.
Select the Hidden checkbox if you do not want this field to be part of exported datasets.
Click on the Advanced tab.
From the Scoring drop down, select the measure used to determine accuracy when running evaluations of model predictions.
Fill in the hex code of the desired field color on the Color field.
Click Save to save your settings.

Edit a Column Field

Click the Edit field button. The available options for column fields can be found in the table below.

Option	Tab	Description
Field name	n/a	The unique name for the field. The field does not accept uppercase letters. It can only contain lowercase letters, numbers, underscore `_` and dash `-`.
Content type	General	The content type of a field: string: appropriate for company names or addresses, as well as payment terms, or for any other field where the RPA developer prefers to build the parsing or formatting logic manually, in the RPA workflow. number: appropriate for amounts or quantities, with intelligent parsing of the decimal/thousands separators. date: the model parses, formats and unifies the output in a yyyy-mm-dd format. phone: appropriate for phone numbers. Formatting removes letters and parentheses, and replaces spaces with dashes. id-no: appropriate for alphanumeric codes, numbers of IDs, it is similar to the string content type, but includes cleaning of any characters coming before a colon `:`. If the id number you need to extract might contain colon `:` characters, please use string as content type instead to avoid data loss.
Shortcut	General	The shortcut key for the field. One or two keys allowed.
Split items	General	Select this checkbox if you want this field to be used as a delimiter between line items or rows in a table. Any line on which this field appears is considered to be a new line item or row in the table. Most commonly, this is used on Line Amount fields on Invoice line items.
Hidden	General	Select this checkbox if you do not want this field to be part of exported datasets.
Color	Advanced	The color for the field in hex format. If the value is not valid, a new one is generated.
Scoring	Advanced	The measure used to determine accuracy when running evaluations of model predictions. It can only be configured for string content type. All other content types use an Exact Match scoring strategy. Options: exact match: a prediction is only deemed to be correct (score of 1) if it exactly matches the true value. If it differs by even a single character, then it is deemed to be incorrect (score of 0). levenshtein: a prediction is deemed to be partially correct according to the Levenshtein distance between the prediction and the true value. If a 10-letter value is predicted correctly, except for the last 2 characters, then the score of that prediction will be 0.8.

Delete a Column Field

To delete a column field, follow these steps:

Click the Edit field button corresponding to the column field you want to delete.
Click the Delete button.
Type the exact name of the field.
Click OK.
The column field and its associated labeled data is deleted.

Regular Fields

These are fields which appear only once on a given document. A line item Invoice Number or Total Amount on an invoice document would be examples of Column fields.

Create a New Regular Field

Click on the right pane in the Regular Fields section. The Create Regular Field window is displayed.
Fill in a unique name for the field in the Enter unique field name field. The field does not accept uppercase letters. It can only contain lowercase letters, numbers, underscore _ and dash -.
Click Create. The Edit Field window is displayed.
Select the content type from the Content Type drop-down.
Select the post processing mechanism in case the model predicts more than one instance of a field on a given page from the Post processing drop-down.
Click the Hotkey field and press a key on your keyboard to automatically populate it.
In the Color field, fill in the hex code of the desired field color o
From the Multi page drop-down, select the data retrieval strategy. This option is used in case that fields appear on a few different pages of a multi-page document. This option defines how the model decides which one to return.
From the Scoring drop-down, select the measure used to determine accuracy when running evaluations of model predictions.
Select the Multi line checkbox if the field to be checked against might span across multiple text lines, such as addresses or descriptions. If this option is not selected, only the first line is returned.
Select the Hidden checkbox if you do not want this field to be part of exported datasets.
Click Save to save your settings.

Edit a Regular Field

Click the Edit field button. The available options for regular fields can be found in the table below.

Option	Tab	Description
Field name	n/a	The unique name for the field. The field does not accept uppercase letters. It can only contain lowercase letters, numbers, underscore `_` and dash `-`.
Content type	General	The content type of a field: string: appropriate for company names or addresses, as well as payment terms, or for any other field where the RPA developer prefers to build the parsing or formatting logic manually, in the RPA workflow. number: appropriate for amounts or quantities, with intelligent parsing of the decimal/thousands separators. date: the model parses, formats and unifies the output in a yyyy-mm-dd format. phone: appropriate for phone numbers. Formatting removes letters and parentheses, and replaces spaces with dashes. id-no: appropriate for alphanumeric codes, numbers of IDs, it is similar to the string content type, but includes cleaning of any characters coming before a colon `:`. If the id number you need to extract might contain colon `:` characters, please use string as content type instead to avoid data loss.
Post processing	Advanced	The post-processing mechanism. If the model predicts more than one instance of a field on a given page, the model returns: highest_confidence: the value with the highest confidence. first_span: the first value. largest_value: the largest numeric value. This is only displayed for number content type and is appropriate for Total Amount fields. longest_value: the value consisting of the largest number of characters.
Shortcut	General	The shortcut key for the field. One or two keys allowed.
Color	Advanced	The color for the field in hex format. If the value is not valid, a new one is generated.
Multi page	Advanced	The data return strategy in case a field appears on different pages of a multipage document. highest_confidence - the default choice for string, phone, and number content types. first_occurrence - the default choice for id-no and date content types. last_occurrence -longest_string shortest_string highest_num_value - only displayed for number content type. lowest_num_value - only displayed for number content type.
Scoring	Advanced	The measure used to determine accuracy when running evaluations of model predictions. It can only be configured for string content type. All other content types use an Exact Match scoring strategy. Options: exact match: a prediction is only deemed to be correct (score of 1) if it exactly matches the true value. If it differs by even a single character, then it is deemed to be incorrect (score of 0). levenshtein: a prediction is deemed to be partially correct according to the Levenshtein distance between the prediction and the true value. If a 10-letter value is predicted correctly, except for the last 2 characters, then the score of that prediction will be 0.8.
Multi line	General	Select this checkbox for fields which may span across multiple text lines (addresses or descriptions), otherwise, only the first line is returned.
Hidden	General	Select this checkbox if you do not want this field to be part of exported datasets.

Delete a Regular Field

To delete a regular field, follow these steps:

Click the Edit field button corresponding to the regular field you want to delete.
Click the Delete button.
Type the exact name of the field.
Click OK.
The regular field and its associated labeled data is deleted.

Classification Fields

Data points which refer to a document as a whole. For instance, the Expense Type of a receipt (Food, Hotel, Airline, Transportation) or the Currency of an invoice (USD, EUR, JPY) would be examples of Classification fields.

Create a New Classification Field

Click on the right pane in the Classification Fields section. The Create a new classification field window is displayed.
Fill in a unique name for the field in the Enter unique field name field. The field does not accept uppercase letters. It can only contain lowercase letters, numbers, underscore _ and dash -.
Click OK. The Edit Field window is displayed.
In the text area, fill in the list of classes and type the names as a comma separated list.
Click Save to save your settings.

Edit a Classification Field

Click the Edit field docs image

button. Define a list of possible values. Commas must separate values. An optional description of the value may be included after colon : (option 1 : description 1).

Delete a Classification Field

To delete a classification field, follow these steps:

Click the Edit field button corresponding to the classification field you want to delete.
Click the Delete button.
Type the exact name of the field.
Click OK.
The classification field and its associated labeled data is deleted.

On this page

Column Fields
Create a New Column Field
Edit a Column Field
Delete a Column Field
Regular Fields
Create a New Regular Field
Edit a Regular Field
Delete a Regular Field
Classification Fields
Create a New Classification Field
Edit a Classification Field
Delete a Classification Field

Was this page helpful?

PREVIOUSUse a Predefined Schema

NEXTImport Documents