Subscribe

UiPath Document Understanding

UiPath Document Understanding

Install Data Manager

Before proceeding, make sure you meet the Requirements and install the Prerequisites.

Requirements

This section details the hardware and software requirements for installing Data Manager.

Hardware Requirements


Machines Involved: VM in the Cloud or On-Prem Box or Laptop
Operating Systems: Windows (Windows 10) or Linux (Ubuntu/CentOS/RedHat)
Computing Engines: CPU
OCR: Required

CPU Cores

RAM (GB)

HDD (GB)

1

4

30

Software Requirements


Linux Operating System

If you install the product on a VM in the cloud, the following operating systems are supported:

Software

Versions

Ubuntu

20.04 LTS
18.04 LTS
16.04 LTS

RHEL

7.x

If you install the product on a machine in an on-premises data center, the following operating systems are supported:

Software

Versions

Ubuntu

20.04 LTS
18.04 LTS
16.04 LTS

RHEL

7.x

CentOS

7.x

Windows Operating System

See the official Docker website for the list of Windows operating systems supported.

On Windows, your machine requires virtualization enabled. We strongly recommend this be done only on physical machines like laptops or desktop workstations. We do not support running on Docker on Windows in Virtual Machines (Cloud or Datacenter) using Nested Virtualization.

Browsers

Software

Versions

Google Chrome

50+

Network Configuration


(Optional) Data Manager needs access to AI Center on-premises or to public SaaS endpoints like https://invoices.uipath.com in case Prelabelling is needed.

Data Manager needs access to OCR engine <IP>:<port_number>. OCR engine might be UiPath Document OCR on-premises, Omnipage OCR on-premises, Google Cloud Vision OCR, Microsoft Read Azure, Microsoft Read on-premises.

Minimal Trial or Proof-of-Concept Configuration


If you want to try training a custom model as a demo on a small volume of data (under 100 documents), then it is possible to run Data Manager on your personal Windows 10 laptop.

Prerequisites

Data Manager is a containerized application that runs on top of docker. You cannot run it on the same machine as AI Center on premises. In order to run it on a separate machine, the prerequisites installer commands below can be used to set up docker and optionally the NVidia drivers. These scripts should not be run on the machine where AI Center will be installed.

The scripts below work best on vanilla or minimal machines where none of the dependencies (like Docker or NVIDIA drivers) have been preinstalled, and no unusual customizations have been done.

🚧

Warning:

Docker images can have many GB in size, so the folder Docker uses to hold its files on Linux must be on a partition sufficiently large to not run out of space. By default, it is always on the root partition.

To see how large your root partition is, type the following in the terminal, and look for the line with a / in the rightmost column:

df -h

If the size of that partition is smaller than the minimal storage requirements, then see the Configuring the Docker Data Folder section.

GPU Machine Install


Linux

Run this command:

curl -fsSL https://raw.githubusercontent.com/UiPath/Infrastructure/master/ML/du_prereq_installer.sh | sudo bash -s -- --env gpu

🚧

Warning:

  • On some systems running the command twice or a system reboot might be required to install all requirements.
  • Azure specific: in order to use the NV-series virtual machines you need to either install the NVIDIA driver before executing the above command, or you can use a Driver Extension from Azure to install the necessary NVIDIA driver according to that tier GPU model.

Azure VMs

If you are installing on a VM in Azure, then use this command instead:

curl -fsSL https://raw.githubusercontent.com/UiPath/Infrastructure/master/ML/du_prereq_installer.sh | sudo bash -s -- --env gpu --cloud azure

CPU Machine Install


Linux

Run this command:

curl -fsSL https://raw.githubusercontent.com/UiPath/Infrastructure/master/ML/du_prereq_installer.sh | sudo bash -s -- --env cpu

Azure VMs

If you are installing on a VM in Azure, then use this command instead:

curl -fsSL https://raw.githubusercontent.com/UiPath/Infrastructure/master/ML/du_prereq_installer.sh | sudo bash -s -- --env cpu --cloud azure

Windows 10

  1. Download and install Docker Desktop. On recently updated versions of Windows 10, you will need WSL2 installed. So when presented with a dialog saying "WSL 2 Installation is Incomplete" please click the Restart button.
  2. Open Powershell and run the below command.
docker plugin install uipath/davfs

When running Data Manager you need to create a working folder for each of them (perhaps named workdir for Data Manager) and include the path to it in the docker run command, after the -v flag. When doing this on Windows, Docker Desktop will pop up a notification like the one below. You need to click on Share it to proceed.

Configuring the Docker Data Folder (Linux only)


Run this command and then reboot:

curl -fsSL https://raw.githubusercontent.com/UiPath/Infrastructure/master/ML/du_prereq_installer.sh | sudo bash -s -- --change-mount </path/to/folder>

🚧

Warning:

After changing the Docker Data Folder, you need to run the prerequisites installer script again.

Docker Cheat Sheet


Docker helps ship software in Docker “images. A running instance of an image is called a container. A container can be stopped, removed, started again, as many times as needed, as long as the image is available.

Once the image is removed, it is lost. The only way to recover it is to pull it again from the registry it came from if it is still available there.

A running container is analogous to a small Virtual Machine, in that it has an internal filesystem and network interfaces, which are separate from the host machine filesystem and network. Folders and ports can be mapped from the container to the host using –v and –p arguments, respectively.

In the table below you can find a list of common commands for the Docker command line.
Click here for the full list of base Docker commands.

Command

Description

"docker login <registry name> -u <username> -p <password>"

Log in to a registry.

"docker pull <registry name>/<image name>:<image tag>"

Download an image from a registry. The tag latest is commonly used to refer to the latest version of an image.

"`docker run –d -p 5000:80 /:

OR

docker run –d –p 5000:80
`"

Run an image in detached mode, while mapping port 80 from inside the container to port 5000 on the host machine, and to . Detached mode means the container does not block the terminal, so you can perform other operations on the same terminal.

"docker images"

List images present on your system.

"docker ps –a"

List all containers (both running and stopped).
The container id is used to refer to that container when one needs to stop it or remove it, for instance.

"docker stop <container id>"

Stop the container.
This command does not remove the container, but is required in advance to removing it.

"docker rm <container id>"

Remove the container.
The container must be stopped beforehand.

"docker logs <container id>"

Display the logs of the container.

"docker rmi <image id>"

Remove one or more images from the system.
This helps save storage space as images can take up a lot of space.

"Docker container prune -f"

Remove all stopped containers.

Linux Terminal Cheat Sheet


Command

Description

"sudo <any_command>"

Run a command as administrator. Try this whenever you get a Permission Denied error.

"ifconfig"

Display information about the network interfaces in your system. Find the IP of your machine in the eth0 or docker0 sections.

"pwd"

Display the path to the current folder.

"ls"

List the content of a directory.

"cd <folder_name>"

Go to a different folder.

"mkdir <folder_name>"

Create a new folder.

Install Data Manager

Make sure you have the registry credentials handy. If you have not received the registry credentials, you need to contact your Sales representative and request a set of credentials be generated for you.

Then type the following in a Powershell or Command Line terminal (on Windows) or shell terminal (on Linux):

docker login aiflprodweacr.azurecr.io -u <username> -p <password>
docker pull aiflprodweacr.azurecr.io/datamanager:latest

For more details about how to use Data Manager see this documentation page.

🚧

Running on the same machine as AI Center

Data Manager standalone container cannot run on the same machine as AI Center.

Launching Data Manager


To launch Data Manager, use the following command:

docker run -d -p <port_number>:80 -v "<path_to_working_folder>:/app/data" aiflprodweacr.azurecr.io/datamanager:latest --license-agreement accept

Open a web browser and enter the following URL: http://localhost:<port_number>.

If you are using the browser on a different machine, replace localhost with the IP address of the machine where the datamanager container is running.

Self-signed Certificates

If an ML Skill deployed in AI Center on-premises does not use a valid HTTPS certificate, you can use a Command Line option to whitelist the root of your self-signed certificate.

The certificate needs to be in a PEM format. As long as this requirement is met, the file extension is insignificant.

The certificate has to exist inside the docker container, so it has to be mounted. Hence, mount the cert file inside the container using the -v and then specify the path to it:

docker run -d -p <port_number>:80 -v "<path_to_working_folder>:/app/data" -v "<path_to_certificate_file>":/custom.cer aiflprodweacr.azurecr.io/datamanager:latest --license-agreement accept --custom-root-cert="/custom.cer"

The root of the self-signed certificate, in this case custom.cer, has to be the same in the first argument as well as in the second. If one is changed, the other needs to be changed too.

Airgapped environments (no internet access)


If you need to set up Data Manager on a machine with no internet access (airgapped), you need to run the above commands on some other machine that does have internet access.

Then you need to save the container as a .tar file, copy the file over to the airgapped machine, and then load it. This is done using the docker save and docker load commands described in Docker documentation.

So on the machine connected to the internet, you need to first install Docker, then, after running the docker login and docker pull commands above, you need to run the command

docker save -o datamanager-latest.tar aiflprodweacr.azurecr.io/datamanager:latest

Then you need to copy the .tar file to the airgapped machine, and then run this command in the same folder where the .tar file was saved:

docker load --input datamanager-latest.tar

Be aware that the tar file will be large, it will have a few gigabytes.

Updated about a month ago


Install Data Manager


Suggested Edits are limited on API Reference Pages

You can only suggest edits to Markdown body content, but not to the API spec.