UiPath AI Fabric

UiPath AI Fabric

Configuring Data Manager

You must first create a working folder for holding your ML data. This is referenced in all commands documented below.

📘

Note

Run the configuration steps below before launching Data Manager. If later on you need to change the configuration (like the OCR engine, or a user password), you need to stop Data Manager using the Docker stop command, run the configuration commands, and then launch Data Manager again. See here the Docker cheat sheet.

Adding users

An admin user with the admin username and admin password is created by default.
To create new user credentials, or modify the password of an existing user, use the following command:

docker run --rm -it -p <port_number>:80 -v "<path_to_working_folder>:/app/data" aiflprodweacr.azurecr.io/datamanager:latest --license-agreement accept --user <username> --passw <password>

Configuring the OCR Service

In order to import documents into Data Manager, it is mandatory to configure an OCR service.
To configure an OCR service, use this command:

docker run --rm -it -p <port_number>:80 -v "<path_to_working_folder>:/app/data" aiflprodweacr.azurecr.io/datamanager:latest --license-agreement accept --ocr-method <METHOD> --ocr-url <URL> --ocr-key <KEY>

If you are running the OCR engine on the same machine as Data Manager, then for the --ocr-url you must use http://<local_IP_address>:<port_number>. You can find the local IP address by typing ipconfig in the Linux terminal or in Windows Powershell. Do not use "localhost" to refer to the local machine.

Choosing the OCR Engine

🚧

Important

Choosing the OCR engine to be used for importing documents into Data Manager is a critical decision.
It is recommended to use the same OCR to import training data (train time) as it will be used when the model is deployed (run time). Ideally, you should try a few different ones, to see which works best on your documents, and only then make a decision.

The on-premises options are UiPath OCR container, Omnipage OCR container (both available from UiPath) and Microsoft Read container (available as preview from Microsoft).
UiPath OCR and Microsoft Read are focused on the English language, or text containing few accented characters. Omnipage works best on cleanly scanned documents and has the best language coverage.
Cloud based options are Google Cloud OCR and Microsoft Read Azure OCR. Google Cloud OCR has the best language coverage.

Configuring Prelabeling

If you already have a model which can extract some of the fields that need labeling, and there are only a few extra fields that require manual labeling, you can save a lot of time by using Data Manager’s Prelabelling feature.
To activate the Prelabeling feature, use the following command:

docker run --rm -it -p <port_number>:80 -v "<path_to_working_folder>:/app/data" aiflprodweacr.azurecr.io/datamanager:latest --license-agreement accept --prelabelling-url <URL> --prelabelling-key <API-KEY>

If you are running prelabelling ML model on the same machine as Data Manager, then for the --prelabelling-url you must use the URL of the public ML Skill from AI Fabric.

After activating it, a Predict button is displayed on the top bar in Data Manager.
Click it in order to prelabel the current document.

Launching Data Manager

To launch Data Manager, use the following command:

docker run -d -p <port_number>:80 -v "<path_to_working_folder>:/app/data" aiflprodweacr.azurecr.io/datamanager:latest --license-agreement accept

Open a web browser and enter the following URL: http://localhost:<port_number>
If you are using the browser on a different machine, replace localhost with the IP address of the machine where the datamanager container is running.

Enabling SSL Encryption (https)

This is not necessary when running Data Manager on your own machine or on a secure office network. However, if you plan to run Data Manager on a remote server open to the Internet, then we strongly suggest you enable SSL encryption. In order to do this you need to obtain the DNS name of the remote server and to generate a https certificate (.crt file) and key (.key file) for that domain name, and place them in a folder called certs on the remote server. Then you need to launch the Data Manager using the following command:

docker run -d -p <port_number>:80 -v "<path_to_working_folder>:/app/data" -v "<path_to_certs_folder>:/certs" aiflprodweacr.azurecr.io/datamanager:latest --license-agreement accept --https-certificate /certs/<cert_filename.crt> --https-private-key /certs/<key_filename.key>

In this command, <cert_filename.crt> refers to the name of the .crt file and <key_filename.key> refers to the name of the .key file which you have placed in the certs folder.

Using a predefined schema

In order to use the Retraining capability in AI Fabric, you need to use a set of fields based on the fields already extracted by the out-of-the-box pretrained models offered by UiPath (Invoice and Receipts extraction). This list of fields is called a schema. To make it easier to get started we are providing the schemas of the out-of-the-box models. These are zip files which you can import into Data Manager just like you would import a dataset, by clicking on the Import button at the top of the screen, and then selecting the zip file from the dialog. The Data Manager will detect that it is a new schema and will import it directly.
The schemas for the pretrained ML models provided by UiPath are available at the following links:
Invoices:
https://raw.githubusercontent.com/UiPath/Infrastructure/master/ML/AiFabric/schema/invoices/schema.zip
Receipts:
https://raw.githubusercontent.com/UiPath/Infrastructure/master/ML/AiFabric/schema/receipts/schema.zip
Purchase Orders:
https://raw.githubusercontent.com/UiPath/Infrastructure/master/ML/AiFabric/schema/puchase_orders/schema.zip
Utility Bills:
https://raw.githubusercontent.com/UiPath/Infrastructure/master/ML/AiFabric/schema/utility_bills/schema.zip
Invoices-India:
https://raw.githubusercontent.com/UiPath/Infrastructure/master/ML/AiFabric/schema/invoices-india/schema.zip
Invoices-Australia:
https://raw.githubusercontent.com/UiPath/Infrastructure/master/ML/AiFabric/schema/invoices-australia/schema.zip

Updated about a month ago


Configuring Data Manager


Suggested Edits are limited on API Reference Pages

You can only suggest edits to Markdown body content, but not to the API spec.