document-understanding

2020.10

false

DEPRECATED

Document Understanding User Guide

DELIVERY:

Last updated Feb 4, 2025

OCR Services

About OCR Services

OCR services are used for the following purposes:

At data labeling time, when importing documents into Data Manager. The services available for this step are UiPath Document OCR (free in cloud or on-premises), Google Cloud OCR (cloud only), Microsoft Read OCR (cloud or on-premises), and Omnipage (on-premises only).
At run time when calling models from RPA workflows. The services available for this step are all the OCR engines integrated with the UiPath RPA platform including the above, plus Abbyy Finereader, Microsoft OCR (legacy), Microsoft Project Oxford OCR, and Tesseract.

In production, we recommend calling the OCR using the Digitize Document activity in your workflow and passing the Document Object Model as input to the activity calling the ML model. For this purpose, you need to use the Machine Learning Extractor activity (Official feed).

As a quick convenience for testing purposes, you can also configure the OCR directly in AI Center (Settings window), but this is not recommended for production deployments.

On Premises Deployment Options

UiPath Document OCR has 3 deployment options available:

On the robot using a LocalServer activity package and the UiPath.OCR.Activities package version 3.1.0-preview or later - requires no internet access and no additional hardware but the Robot machine needs a CPU with AVX2 support.
- This should be your default option. For larger volumes you can add more Robots.
Standalone Docker container running on Linux GPU machine (see below - recommended for volumes over 1M pages/yr) - Internet access required for licensing/metering
- This should be your default option for large volumes over 2-3M pages per year.
Standalone Docker container running on Linux CPU machine (see below) - Internet access required for licensing/metering
- Only for rare situations where your Robot machines run on CPUs without AVX2 support, or where GPU cannot be obtained.
ML Skill in AI Center (see ML Packages section) (GPU strongly recommended) - Internet access not required on premises if AI Center installation is airgapped

Requirements

This section details the hardware and software requirements for installing OCR Engines.

Hardware Requirements

Machines Involved : VM in the Cloud or On-Prem Box or Laptop
Operating Systems: Windows (Windows 10) or Linux (Ubuntu/CentOS/RedHat)
Computing Engines: CPU or GPU
OCR: UiPath Document OCR CPU or UiPath Document OCR GPU or OmniPage OCR CPU

	CPU Cores	RAM (GB)	Video RAM (GB)	HDD (GB)
UiPath CPU	8	8		50
UiPath GPU	1	4	8	50
OmniPage CPU	1	2		30

Software Requirements

The software requirements for OCR Engines are the same as for Data Manager.

Network Configuration

Data Manager needs access to OCR engine <IP>:<port_number>. OCR engine might be UiPath Document OCR on-premises, Omnipage OCR on-premises, Google Cloud Vision OCR, Microsoft Read Azure, Microsoft Read on-premises.

Robots need access to OCR <IP>:<port_number>. Same OCR options as above, except for Omnipage, which is available in the Robots directly as an Activity Pack.

OCR engines need access to the Licensing server hosted by UiPath in Azure, on port 443.

Minimal Trial or Proof-of-Concept Configuration

If you only want to serve pre-trained out-of-the-box models, you can run an OCR engine on your Windows 10 laptop. Make sure Docker Desktop has 8G of RAM available.

If you want to try training a custom model as a demo on a small volume of data (under 100 documents), you can run the OCR Engine on an environment with a limit of 4GB of RAM. For small cases like this, a GPU for the OCR engine may not be necessary.

Prerequisites

OCR Engines are containerized applications that run on top of docker. You cannot run these on the same machine as AI Center on-premises. To run them on a separate machine, the prerequisites installer commands below can be used to set up docker and optionally the NVidia drivers. These scripts should not be run on the machine where AI Center will be installed.

The prerequisites for OCR Engines are the same as for Data Manager.

(Optional) GPU Machine Install

Linux

Run this command:

curl -fsSL https://raw.githubusercontent.com/UiPath/Infrastructure/master/ML/du_prereq_installer.sh | sudo bash -s -- --env gpucurl -fsSL https://raw.githubusercontent.com/UiPath/Infrastructure/master/ML/du_prereq_installer.sh | sudo bash -s -- --env gpu

On some systems running the command twice or a system reboot might be required to install all requirements.

Azure Specific: In order to use the NV-series virtual machines you need to either install the NVIDIA driver before executing the above command, or you can use a Driver Extension from Azure to install the necessary NVIDIA driver according to that tier GPU model.

Azure VMs

If you are installing on a VM in Azure, then use this command instead:

curl -fsSL https://raw.githubusercontent.com/UiPath/Infrastructure/master/ML/du_prereq_installer.sh | sudo bash -s -- --env gpu --cloud azurecurl -fsSL https://raw.githubusercontent.com/UiPath/Infrastructure/master/ML/du_prereq_installer.sh | sudo bash -s -- --env gpu --cloud azure

Installation

UiPath Document OCR (Preview)

UiPath Document OCR is a proprietary OCR technology of UiPath, supporting characters used by the following Latin script languages: English, French, German, Italian, Portuguese, Romanian, and Spanish. Text in other languages will be recognized but without accents, for instance, “Ł” in Polish will be recognized as “L”. Pages processed using UiPath Document OCR are not counted towards the page quota purchased along with the Document Understanding Enterprise license so UiPath Document OCR is free to use.

UiPath Document OCR is available both on-premises as a docker container and in the cloud as a cloud service API with the URL: https://du.uipath.com/ocr. See the full description of the available URLs on the Public Endpoints page.

To install UiPath Document OCR, run these commands:

docker login aiflprodweacr.azurecr.io -u *** -p **docker pull aiflprodweacr.azurecr.io/uipath-ocr:latestdocker login aiflprodweacr.azurecr.io -u *** -p **docker pull aiflprodweacr.azurecr.io/uipath-ocr:latest

Run using CPUs

docker run -d -p 5000:80 aiflprodweacr.azurecr.io/uipath-ocr:latest LicenseAgreement=acceptdocker run -d -p 5000:80 aiflprodweacr.azurecr.io/uipath-ocr:latest LicenseAgreement=accept

Run using GPU

docker run -d -p 5000:80 --gpus all aiflprodweacr.azurecr.io/uipath-ocr:latest LicenseAgreement=acceptdocker run -d -p 5000:80 --gpus all aiflprodweacr.azurecr.io/uipath-ocr:latest LicenseAgreement=accept

In AI Center, when creating a new ML Package, at the bottom of the screen there is the OCR configuration section where you can define the OCR Engine type, the OCR URL, and the OCR Key. The OCR Key is the API Key you obtain from the Licenses section of your Automation Cloud account.

Important: UiPath Document OCR container and Omnipage OCR container cannot run on the same machine as AI Center on-premises.

OmniPage OCR

The Omnipage docker container is intended to be used only with Data Manager, for importing documents in languages that UiPath Document OCR does not yet support.

Run these commands:

docker login aiflprodweacr.azurecr.io -u *** -p ***docker pull aiflprodweacr.azurecr.io/omnipage-ocr:latestdocker run -d -p 5100:80 aiflprodweacr.azurecr.io/omnipage-ocr:latest LicenseAgreement=acceptdocker login aiflprodweacr.azurecr.io -u *** -p ***docker pull aiflprodweacr.azurecr.io/omnipage-ocr:latestdocker run -d -p 5100:80 aiflprodweacr.azurecr.io/omnipage-ocr:latest LicenseAgreement=accept

Google Cloud OCR

The endpoint can be obtained from the Google Cloud Platform documentation. The ApiKey can be obtained from your Google Cloud Platform Console if you have a Google Cloud Vision service in your subscription.

Microsoft Read

Important: Applicable to both Azure and on-premises container endpoints.

In the case of Azure services, you need to provide both the Endpoint and the ApiKey.

In the case of on-premises container endpoints, API Key is not necessary.

Configuring OCR Service in Data Manager and AI Center Document Understanding ML Packages

The table below shows how to configure the six supported OCR engine types in both Data Manager and AI Center.

Important: The ocr.method argument corresponds to the OCR Engine dropdown in the ML Package creation view in AI Center.

OCR Engine	ocr.method	ocr.key	ocr.url
UiPath	uipath	UiPath Automation Cloud Document Understanding API Key Enterprise Plan	`http://<IP_addr>:<port_number>`
OmniPage	omnipage	UiPath Automation Cloud Document Understanding API Key Enterprise Plan	`http://<IP_addr>:<port_number>`
Google	google	GCP Console API Key	`https://vision.googleapis.com/v1/images:annotate`
Microsoft Read 2.0 On-Prem	microsoft	None	`http://<IP_addr>:<port_number>/vision/v2.0/read/core/Analyze`
Microsoft Read 2.0 Azure	microsoft	API Key for your resource from Azure Portal	`<Azure_resource_Endpoint>/vision/v2.0/read/core/asyncBatchAnalyze`
Microsoft Read 3.1 On-Prem	microsoft	None	`http://<IP_addr>:<port_number>/vision/v3.1/read/analyze`
Microsoft Read 3.1 Azure	microsoft	API Key for your resource from Azure Portal	`<Azure_resource_Endpoint>/vision/v3.1/read/analyze`

On this page

About OCR Services
On Premises Deployment Options
Requirements
Hardware Requirements
Software Requirements
Network Configuration
Minimal Trial or Proof-of-Concept Configuration
Prerequisites
(Optional) GPU Machine Install
Installation
UiPath Document OCR (Preview)
OmniPage OCR
Google Cloud OCR
Microsoft Read
Configuring OCR Service in Data Manager and AI Center Document Understanding ML Packages

Was this page helpful?

PREVIOUSCheckboxes & Signatures

NEXTPublic Endpoints

Support and Services

Get The Help You Need

UiPath Academy

Learning RPA - Automation Courses

UiPath Forum

UiPath Community Forum

Trust and Security

Cookies Policy