Subscribe

UiPath Document Understanding

UiPath Document Understanding

Use Document Manager

This page describes how to use Document Manager to label a new dataset and retrain an ML model.

Access and configure Document Manager

Launch the created data labeling session in First Run Experience and go to the settings to configure the OCR.

Choose the OCR you intend to use in the OCR method dropdown menu. For UiPathDocumentOCR, paste the Document Understanding license key (retrieve the Document Understanding API key from the Admin > License page) and then paste the OCR URL you generated when you deployed UiPathDocumentOCR.

1499

Configure the prelabelling with the models that you have deployed following the instructions here. Paste the model public ML Skill endpoint and the Document Understanding license key, and then click Save.

1488

For more details, please check the documentation here: Configure Document Manager.

Import Document

Click the Import button import from a Document Manager Session.

1674

Name the dataset and click Browse files to upload.

1515

Select the document you want to upload.

1278

Click YES.

1280

For more details, please check the documentation here: Import Documents.

Create extraction fields

Click create_field to create fields to be extracted.

You can create up to 40 fields.

For this validation exercise, you can create some common invoice fields such as date, name, invoice-no, and total. Please ensure to change the content type accordingly - date (date), name (string), invoice-no (string), and total (number).

1503

For more details, please check the documentation here: Create & Configure Fields.

Label documents

Now you can start to label the documents.

Click the predict button predict on top to use the base invoice model to predict the labels for the defined fields, and correct it if the prediction is wrong.

To change the label, drag the mouse over the field and hit the keyboard shortcut to label it (e.g., d for labeling date in the below example).

Use the arrow on top to switch to the next document until you have finished the validation of labels for all uploaded invoices.

📘

Note:

Since the Invoices base model has already performed really well and the sample invoice is simple without too much variation, the prediction accuracy is close to 100% in this case and you may not need to correct any labels.

1870

For more details about labeling documents, please check the documentation here: Label Documents.

Export documents

Make sure to select the correct dataset in the dataset filtering and click the Export button export.

1650

Click Export.

1403

Go to Datasets under the same AI Center project, you should be able to see the exported training dataset.

1653

For more details, please check the documentation: Export Documents.

Train a custom model on AI Center

Go to Pipelines > Create new. Please select the evaluation run type, select the model package and the input dataset.

1653

Please select the sub folder under Export as the input dataset.

1431

Click Create to start the pipeline. It may take 1-2 hours for the pipeline to run on CPU machines.

Deploy the retrained ML model as an ML Skill

Go to ML Skills and create a new ML Skill.

Choose the same invoice model package created before. As we have retrained the model, now there is a new minor package version (1 vs 0). Please make sure to select the latest.

1395

Once the ML Skill is created, please go to Modify current deployment to make the ML skill public. Switch the toggle and click Confirm.

1217

Copy the URL of the public ML Skill for later use.

1278

Congrats! You have now retrained an Invoice model with your own dataset and created the endpoint to access the model.

Updated about a month ago


Use Document Manager


Suggested Edits are limited on API Reference Pages

You can only suggest edits to Markdown body content, but not to the API spec.