document-understanding
2020.10
false
UiPath logo, featuring letters U and I in white
DEPRECATED

Document Understanding User Guide

Automation CloudAutomation Cloud Public SectorAutomation SuiteStandalone
Last updated Jul 29, 2024

Install Data Manager

Before proceeding, make sure you meet the Requirements and install the Prerequisites.

Requirements

This section details the hardware and software requirements for installing Data Manager.

Hardware Requirements

  • Machines Involved: VM in the Cloud or On-Prem Box or Laptop

  • Operating Systems: Windows (Windows 10) or Linux (Ubuntu/CentOS/RedHat)

  • Computing Engines: CPU

  • OCR: Required

CPU Cores

RAM (GB)

HDD (GB)

1

4

30

Software Requirements

Linux Operating System

If you install the product on a VM in the cloud, the following operating systems are supported:

Software

Versions

Ubuntu

20.04 LTS

18.04 LTS

16.04 LTS

RHEL

7.x

If you install the product on a machine in an on-premises data center, the following operating systems are supported:

Software

Versions

Ubuntu

20.04 LTS

18.04 LTS

16.04 LTS

RHEL

7.x

CentOS

7.x

Windows Operating System

See the official Docker website for the list of Windows operating systems supported.

On Windows, your machine requires virtualization enabled. We strongly recommend this be done only on physical machines like laptops or desktop workstations. We do not support running on Docker on Windows in Virtual Machines (Cloud or Datacenter) using Nested Virtualization.

Browsers

Software

Versions

Google Chrome

50+

Network Configuration

(Optional) Data Manager needs access to AI Center on-premises or to public SaaS endpoints like https://du.uipath.com/ie/invoices in case Prelabelling is needed.

Data Manager needs access to OCR engine <IP>:<port_number>. OCR engine might be UiPath Document OCR on-premises, Omnipage OCR on-premises, Google Cloud Vision OCR, Microsoft Read Azure, Microsoft Read on-premises.

Prerequisites

Data Manager is a containerized application that runs on top of docker. You cannot run it on the same machine as AI Center on premises. To run it on a separate machine you only need to have Docker installed (on Linux) or Docker Desktop installed (on Windows).

Important: Docker images can have many GB in size, so the folder Docker uses to hold its files on Linux must be on a partition sufficiently large to not run out of space. By default, it is always on the root partition.
To see how large your root partition is, type the following in the terminal, and look for the line with a / in the rightmost column:
df -hdf -h

If the size of that partition is smaller than the minimal storage requirements, then see the Configuring the Docker Data Folder section.

Installing Docker

Linux

Follow instructions in the official Docker documentation, or run this command:

curl -fsSL https://raw.githubusercontent.com/UiPath/Infrastructure/master/ML/du_prereq_installer.sh | sudo bash -s -- --env cpucurl -fsSL https://raw.githubusercontent.com/UiPath/Infrastructure/master/ML/du_prereq_installer.sh | sudo bash -s -- --env cpu

If this command fails, then you have an incompatible Linux operating system and you need to request your IT to install Docker on the machine following the instructions in the official Docker documentation.

Azure VMs

If you are installing on a VM in Azure, then use this command instead:

curl -fsSL https://raw.githubusercontent.com/UiPath/Infrastructure/master/ML/du_prereq_installer.sh | sudo bash -s -- --env cpu --cloud azurecurl -fsSL https://raw.githubusercontent.com/UiPath/Infrastructure/master/ML/du_prereq_installer.sh | sudo bash -s -- --env cpu --cloud azure

Windows 10

Download and install Docker Desktop. On recently updated versions of Windows 10, you will need WSL2 installed. So when presented with a dialog saying "WSL 2 Installation is Incomplete" please click the Restart button.

When running Data Manager you need to create a working folder for each Docker container (perhaps named workdir for Data Manager) and include the path to it in the docker run command, after the -v flag. When doing this on Windows, Docker Desktop will pop up a notification like the one below. You need to click on Share it to proceed.


Configuring the Docker Data Folder (Linux Only)

Fill in the path to the folder where you want Docker to hold its files, then run this command and then reboot:

curl -fsSL https://raw.githubusercontent.com/UiPath/Infrastructure/master/ML/du_prereq_installer.sh | sudo bash -s -- --change-mount </path/to/folder>curl -fsSL https://raw.githubusercontent.com/UiPath/Infrastructure/master/ML/du_prereq_installer.sh | sudo bash -s -- --change-mount </path/to/folder>

Docker Cheat Sheet

Docker helps ship software in Docker “images. A running instance of an image is called a container. A container can be stopped, removed, started again, as many times as needed, as long as the image is available.

Once the image is removed, it is lost. The only way to recover it is to pull it again from the registry it came from if it is still available there.

A running container is analogous to a small Virtual Machine, in that it has an internal filesystemand network interfaces, which are separate from the host machine filesystem and network. Folders and ports can be mapped from the container to the host using –v and –p arguments, respectively.

In the table below you can find a list of common commands for the Docker command line.

Click here for the full list of base Docker commands.

Command

Description

"docker login <registry name> -u <username> -p <password>"

Log in to a registry.

"docker pull <registry name>/<image name>:<image tag>"

Download an image from a registry. The tag latest is commonly used to refer to the latest version of an image.

"`docker run –d -p 5000:80 <registry name>/<image name>:<image tag>

OR

docker run –d –p 5000:80 <image id>

`"

Run a container in detached mode, while mapping port 80 from inside the container to port 5000 on the host machine, and <container folder> to <host folder>. Detached mode means the container does not block the terminal, so you can perform other operations on the same terminal.

"docker images"

List images present on your system.

"docker ps –a"

List all containers (both running and stopped).

  • The container id is used to refer to that container when one needs to stop it or remove it, for instance.
"docker stop <container id>"

Stop the container

  • This command does not remove the container, but is required in advance to removing it.
"docker rm <container id>"

Remove the container

  • The container must be stopped beforehand.
"docker logs <container id>"

Display the logs of the container

"docker rmi <image id>"

Remove one or more images from the system.

  • This helps save storage space as images can take up a lot of space.
"Docker container prune -f"

Remove all stopped containers

Linux Terminal Cheat Sheet

Command

Description

"sudo <any_command>"

Run a command as administrator. Try this whenever you get a Permission Denied error.

"ifconfig"

Display information about the network interfaces in your system. Find the IP of your machine in the eth0 or docker0 sections.

"pwd"

Display the path to the current folder.

"ls"

List the content of a directory.

"cd <folder_name>"

Go to a different folder.

"mkdir <folder_name>"

Create a new folder.

Install Data Manager

Make sure you have the registry credentials handy. If you have not received the registry credentials, you need to contact your Sales representative and request a set of credentials be generated for you.

Then type the following in a Powershell or Command Line terminal (on Windows) or shell terminal (on Linux):

docker login aiflprodweacr.azurecr.io -u <username> -p <password>docker pull aiflprodweacr.azurecr.io/datamanager:latestdocker login aiflprodweacr.azurecr.io -u <username> -p <password>docker pull aiflprodweacr.azurecr.io/datamanager:latest
Important: Data Manager standalone container cannot run on the same machine as AI Center.

Launching Data Manager

To launch Data Manager, use the following command:

docker run -d -p <port_number>:80 -v "<path_to_working_folder>:/app/data" aiflprodweacr.azurecr.io/datamanager:latest --license-agreement acceptdocker run -d -p <port_number>:80 -v "<path_to_working_folder>:/app/data" aiflprodweacr.azurecr.io/datamanager:latest --license-agreement accept

Replace <port_number> with the port number where you want the Data Manager to be accessible. Ports in the thousands are common, like 5000, 8000, 8080, 8081, etc. Replace the <path_to_working_folder> with the local folder where you want Data Manager to keep all its internal configuration and data. Make sure that docker service has access to that folder.

After running this command, open a web browser and enter the following URL: http://localhost:<port_number>.
If you are using the browser on a different machine, replace localhost with the IP address of the machine where the datamanager container is running.

To run multiple Data Manager sessions, change the folder path and run the command again.

Self-signed Certificates

If an ML Skill deployed in AI Center on-premises does not use a valid HTTPS certificate, you can use a Command Line option to whitelist the root of your self-signed certificate.

The certificate needs to be in a PEM format. As long as this requirement is met, the file extension is insignificant.

The certificate has to exist inside the docker container, so it has to be mounted. Hence, mount the cert file inside the container using the -v and then specify the path to it:

docker run -d -p <port_number>:80 -v "<path_to_working_folder>:/app/data" -v "<path_to_certificate_file>":/custom.cer aiflprodweacr.azurecr.io/datamanager:latest --license-agreement accept --custom-root-cert="/custom.cer"docker run -d -p <port_number>:80 -v "<path_to_working_folder>:/app/data" -v "<path_to_certificate_file>":/custom.cer aiflprodweacr.azurecr.io/datamanager:latest --license-agreement accept --custom-root-cert="/custom.cer"
Note: path_to_certificate_file does not support symlinks.
The root of the self-signed certificate, in this case custom.cer, has to be the same in the first argument as well as in the second. If one is changed, the other needs to be changed too.

Airgapped Environments (no Internet Access)

If you need to set up Data Manager on a machine with no internet access (airgapped), you need to run the above commands on some other machine that does have internet access.

Then you need to save the container as a .tar file, copy the file over to the airgapped machine, and then load it. This is done using the docker save and docker load commands described in Docker documentation.

So on the machine connected to the internet, you need to first install Docker, then, after running the docker login and docker pull commands above, you need to run the command

docker save -o datamanager-latest.tar aiflprodweacr.azurecr.io/datamanager:latestdocker save -o datamanager-latest.tar aiflprodweacr.azurecr.io/datamanager:latest

Then you need to copy the .tar file to the airgapped machine, and then run this command in the same folder where the .tar file was saved:

docker load --input datamanager-latest.tardocker load --input datamanager-latest.tar

Be aware that the tar file will be large, it will have a few gigabytes.

Was this page helpful?

Get The Help You Need
Learning RPA - Automation Courses
UiPath Community Forum
Uipath Logo White
Trust and Security
© 2005-2024 UiPath. All rights reserved.