Subscribe

UiPath Document Understanding

UiPath Document Understanding

Data Manager

UiPath Data Manager is a lightweight web application delivered as a Docker container. It enables multiple users to perform a variety of operations:

Define and configure the fields to be extracted by an ML model.
Import documents for labeling.
Prelabel documents using a preexisting ML model such as Invoice Extraction or Receipt Extraction provided by UiPath out-of-the-box, or by using a model trained using AI Center.
Label documents.
Export documents in the format expected by the AI Center Training pipelines.

The User Interface

The Data Manager interface contains the following panels:

Management Bar

Displayed at the top of the page in Data Manager.

Enables you to perform multiple operations: navigate in between documents, delete/restore a document, search/filter documents, run AI model predictions, import and export documents.

Here are the options available in the management bar:

Option

Icon

Description

Navigation

navigatenavigate

Navigate between documents that match the active filter. In between the two arrows, a counter is displayed. It illustrates the number of the current document out of the total number of documents that match the active search/filter.

Search

searchsearch

Filter documents. Filter is also applied when exporting documents. You can also filter by words from a document or by document names.

Delete / Restore

deletedelete / restorerestore

Delete or restore a document. Deleted documents can be found under the deleted filter.

Predict

predictpredict

Run AI model predictions and display the results.

Import

importimport

Import new documents to be labeled or revised.

Export

exportexport

Export labeled data. The active search/filter applies to the exported data.

Document

documentdocument

Open document in new tab.
Also, on the right-hand side, you can see the name of the currently active document and the document type: Training document, Test document, or Validation document.

Settings

settingssettings

Configure OCR or Prelabelling settings. Also, access the How to... panel.

Search


The search box has two functionalities: you can either search for specific text in the content of the imported documents or you can filter the document using keywords.

There are seven predefined keywords, namely:
train-validate-set
train-set
test-set
validate-set
deleted
labelled
unlabelled

Besides these predefined keywords, you can also filter based on named batches depending on how many batches you imported into Data Manager:
batch:<batch_name_1>
batch:<batch_name_2>
batch:<batch_name_3>
etc.

Search/filter scenarios

  • You can search using one word of text: only the documents containing that specific word are displayed.
  • You can search using more than one word of text: only the documents containing those specific words, one after another, are displayed.

📘

Note:

The search is case-insensitive.

  • You can filter using a keyword: for instance, if you select labelled, only the labeled documents are displayed.
  • You can filter using more than one keyword: for instance, if you select labelled and train-set, only the labeled documents marked as trained are displayed. The order in which the keywords appear does not matter.
  • You can also combine text with keywords: for instance, if you type payment and labelled, only the labeled documents containing this specific word are displayed.

🚧

Warning!

You cannot text search using keywords.

Predict


After activating Prelabelling, a Predict button predictpredict appears in the management bar. Click it to prelabel the current document.

Settings


The settings button has two available options:

  • Settings where you can configure the OCR or Prelabelling
  • How to... which has the purpose of a help menu

OCR

In order to import documents into Data Manager, it is mandatory to configure an OCR service. The following options are available:

OCR method

The cloud-based options are:

  • UiPath Document OCR - https://du.uipath.com/ocr;
  • Google Cloud Vision OCR which has the best language coverage;
  • Google Cloud Vision OCR for Japanese optimal for reading Japanese documents;
  • Microsoft Read OCR.
OCR URL

Configuring the OCR requires the OCR service to have a URL. Here are the possible URLs you can use:

OCR key

The Document Understanding API Key. Mandatory for Data Manager Cloud and Data Manager On-Prem Online. It is not required for Data Manager On-Prem Air-gapped.

Prelabelling

If you already have a model which can extract some of the fields that need labeling, and there are only a few extra fields that require manual labeling, you can save a lot of time by using Data Manager’s Prelabelling feature. The following options are available:

Prelabelling URL

Prelabelling requires the ML model has a URL. Here are the possible URLs you can use:

ML Skills in AI Center Cloud can be used for prelabeling in Data Manager if they are exposed as Public ML Skills.

Prelabelling key

The Document Understanding API Key. Mandatory for Data Manager Cloud and Data Manager On-Prem Online. It is not required for Data Manager On-Prem Air-gapped.

How to...

The How to... option accesses the Data Manager help menu where you can find:

  • The Documentation link leading to this documentation page.
  • The Labeling Controls section which displays the controls to be used when handling data.
  • The Document Shortcuts section which displays the shortcuts used to perform various operations such as navigation and UI scaling.
  • The Configuration section which displays details about the instance configuration as performed during installation.

Column Fields

Column fields have the following options:

  • Create new column field create_fieldcreate_field
  • Edit field edit_fieldedit_field
  • Expand/collapse column field values expand_collapse_column_fieldexpand_collapse_column_field

For more details on column fields, visit this section.

Regular Fields

Regular fields have the following options:

  • Create a new regular field create_fieldcreate_field
  • Edit field edit_fieldedit_field

For more details on regular fields, visit this section.

Classification Fields

Classification fields have the following options:

  • Create a new classification field create_fieldcreate_field
  • Edit field edit_fieldedit_field

For more details on classification fields, visit this section.

Document View

In document view you can label documents by selecting the word boxes and assigning them to a field by pressing a key.

For more details on how to label documents, visit this page.

In document view you can also right-click the word box and verify the extracted information.

To zoom in or out, use CTRL + mouse scroll.

When you open a new Data Manager session or when you have an empty filter, certain guidelines are displayed in document view:

Also, loading failures are also displayed in document view:

Updated 13 days ago


Data Manager


Suggested Edits are limited on API Reference Pages

You can only suggest edits to Markdown body content, but not to the API spec.