Document Understanding
2022.4
false
Supported Languages - Standalone 2022.4
logo
Document Understanding
Last updated Oct 19, 2023

Supported Languages

The pre-trained and the supported languages for all Document Understanding ML Packages can be found in the table below.

ML Package

Description

Pre-trained Languages

Supported Languages

UiPathDocumentOCR

Reads document text.

  • English
  • French
  • German
  • Italian
  • Portuguese
  • Romanian
  • Spanish
  • English
  • French
  • German
  • Italian
  • Portuguese
  • Romanian
  • Spanish
       

DocumentUnderstanding

Extracts commonly occurring data points from any type of structured or semi-structured documents, building an ML model from scratch.

n/a

  • Latin-based languages
  • Cyrillic languages
  • Greek left-to-right
  • Japanese (Preview)
  • Chinese
       

DocumentClassifier

Classifies documents using a custom trained ML model.

n/a

  • Language-agnostic (supports any language as long as the OCR supports it)
       

Invoices

Extracts commonly occurring data points from invoices.

  • English
  • French
  • German
  • Portuguese
  • Romanian
  • Spanish
  • Latin-based languages
  • Cyrillic languages
  • Greek left-to-right

InvoicesAustralia

Extracts commonly occurring data points from Australian invoices.

  • English
  • Latin-based languages
  • Cyrillic languages
  • Greek left-to-right

InvoicesIndia

Extracts commonly occurring data points from Indian invoices.

  • English
  • Latin-based languages
  • Cyrillic languages
  • Greek left-to-right

InvoicesJapan

Extracts commonly occurring data points from Japanese invoices.

  • Japanese (Preview)
  • Japanese (Preview)

InvoicesChina

Extracts commonly occurring data points from Chinese invoices.

  • Chinese (Preview)
  • Chinese (Preview)

Receipts

Extracts commonly occurring data points from receipts.

  • English
  • Finnish
  • French
  • German

    Norwegian

    Romanian

    Spanish

Latin-based languages

Cyrillic languages

Greek left-to-right

PurchaseOrders

Extracts commonly occurring data points from purchase orders.

English

German

Latin-based languages

Cyrillic languages

Greek left-to-right

UtilityBills

Extracts commonly occurring data points from utility bills.

English

Latin-based languages

Cyrillic languages

Greek left-to-right

IDCards

Extracts commonly occurring data points from ID cards.

Australia

Austria

Belgium

Canada

Croatia

Cyprus

Finland

France

Germany

Hong Kong

Hungary

India

Italy

Netherlands

Poland

Romania

Spain

Switzerland

United Kingdom

USA (all 50 states plus DC)

Latin-based languages

Cyrillic languages

Greek left-to-right

Passports

Extracts commonly occurring data points from passports.

International

International

RemittanceAdvices

Extracts commonly occurring data points from remittance advices.

English

Latin-based languages

Cyrillic languages

Greek left-to-right

DeliveryNotes

Extracts commonly occurring data points from delivery notes.

English

German

Latin-based languages

Cyrillic languages

Greek left-to-right

W2

Extracts commonly occurring data points from W-2 forms.

English

Latin-based languages

Cyrillic languages

Greek left-to-right

W9

Extracts commonly occurring data points from W-9 forms.

English

Spanish

Latin-based languages

Cyrillic languages

Greek left-to-right

       

FormExtractor

Provides the endpoint required by the Form Extractor activity.

n/a

Latin-based languages

Cyrillic languages

Greek left-to-right

Asian languages

IntelligentFormExtractor

Provides the endpoint required by the Intelligent Form Extractor activity.

n/a

Latin-based languages

Cyrillic languages

Greek left-to-right

Asian languages

IntelligentKeywordClassifier

Provides the endpoint required by the Intelligent Keyword Classifier activity.

n/a

Latin-based languages

Cyrillic languages

Greek left-to-right

Asian languages

HandwritingRecognition

Reads handwritten text.

English

English

Observations

  • To train a model on Japanese documents, use either the DocumentUnderstanding package or the InvoicesJapan package.
  • To train a model on Chinese documents, use either the DocumentUnderstanding package or the InvoicesChina package.
  • To train a model on Latin script documents, use any package except for InvoicesJapan or InvoicesChina.
  • For the supported languages, retraining may be required to get the expected accuracy if the documents are considerably different from the original model training dataset.
  • For the supported languages which are not pre-trained by the model, you can train a model with your own data in AI Center, assuming the OCR engine supports it as well.
  • Automatic reformatting of dates in a standard yyyy-mm-dd format for Asian languages is currently supported only for Japanese. For documents in other Asian languages, you can extract the dates as String content type and format it in the RPA workflow.
logo
Get The Help You Need
logo
Learning RPA - Automation Courses
logo
UiPath Community Forum
Uipath Logo White
Trust and Security
© 2005-2023 UiPath. All rights reserved.