You must first create a working folder for holding your ML data. This is referenced in all commands documented below.
Run the configuration steps below before launching Data Manager. If later on you need to change the configuration (like the OCR engine, or a user password), you need to stop Data Manager using the Docker stop command, run the configuration commands, and then launch Data Manager again. See here the Docker cheat sheet.
An admin user with the admin username and admin password is created by default.
To create new user credentials, or modify the password of an existing user, use the following command:
docker run --rm -it -p <port_number>:80 -v "<path_to_working_folder>:/app/data" aiflprodweacr.azurecr.io/datamanager:latest --license-agreement accept --user <username> --passw <password>
In order to import documents into Data Manager, it is mandatory to configure an OCR service.
To configure an OCR service, use this command:
docker run --rm -it -p <port_number>:80 -v "<path_to_working_folder>:/app/data" aiflprodweacr.azurecr.io/datamanager:latest --license-agreement accept --ocr-method <METHOD> --ocr-url <URL> --ocr-key <KEY>
If you are running the OCR engine on the same machine as Data Manager, then for the --ocr-url you must use http://<local_IP_address>:<port_number>. You can find the local IP address by typing ipconfig in the Linux terminal or in Windows Powershell. Do not use "localhost" to refer to the local machine.
Choosing the OCR engine to be used for importing documents into Data Manager is a critical decision.
It is recommended to use the same OCR to import training data (train time) as it will be used when the model is deployed (run time). Ideally, you should try a few different ones, to see which works best on your documents, and only then make a decision.
The on-premises options are UiPath OCR container, Omnipage OCR container (both available from UiPath) and Microsoft Read container (available as preview from Microsoft).
UiPath OCR and Microsoft Read are focused on the English language, or text containing few accented characters. Omnipage works best on cleanly scanned documents and has the best language coverage.
Cloud based options are Google Cloud OCR and Microsoft Read Azure OCR. Google Cloud OCR has the best language coverage.
If you already have a model which can extract some of the fields that need labeling, and there are only a few extra fields that require manual labeling, you can save a lot of time by using Data Manager’s Prelabelling feature.
To activate the Prelabeling feature, use the following command:
docker run --rm -it -p <port_number>:80 -v "<path_to_working_folder>:/app/data" aiflprodweacr.azurecr.io/datamanager:latest --license-agreement accept --prelabelling-url <URL> --prelabelling-key <API-KEY>
If you are running prelabelling ML model on the same machine as Data Manager, then for the --prelabelling-url you must use the URL of the public ML Skill from AI Fabric.
After activating it, a Predict button is displayed on the top bar in Data Manager.
Click it in order to prelabel the current document.
To launch Data Manager, use the following command:
docker run -d -p <port_number>:80 -v "<path_to_working_folder>:/app/data" aiflprodweacr.azurecr.io/datamanager:latest --license-agreement accept
Open a web browser and enter the following URL:
If you are using the browser on a different machine, replace
localhost with the IP address of the machine where the datamanager container is running.
This is not necessary when running Data Manager on your own machine or on a secure office network. However, if you plan to run Data Manager on a remote server open to the Internet, then we strongly suggest you enable SSL encryption. In order to do this you need to obtain the DNS name of the remote server and to generate a https certificate (.crt file) and key (.key file) for that domain name, and place them in a folder called certs on the remote server. Then you need to launch the Data Manager using the following command:
docker run -d -p <port_number>:80 -v "<path_to_working_folder>:/app/data" -v "<path_to_certs_folder>:/certs" aiflprodweacr.azurecr.io/datamanager:latest --license-agreement accept --https-certificate /certs/<cert_filename.crt> --https-private-key /certs/<key_filename.key>
In this command, <cert_filename.crt> refers to the name of the .crt file and <key_filename.key> refers to the name of the .key file which you have placed in the certs folder.
In order to use the Retraining capability in AI Fabric, you need to use a set of fields based on the fields already extracted by the out-of-the-box pretrained models offered by UiPath (Invoice and Receipts extraction). This list of fields is called a schema. To make it easier to get started we are providing the schemas of the out-of-the-box models. These are zip files which you can import into Data Manager just like you would import a dataset, by clicking on the Import button at the top of the screen, and then selecting the zip file from the dialog. The Data Manager will detect that it is a new schema and will import it directly.
The schemas for the pretrained ML models provided by UiPath are available at the following links:
Updated about a month ago