Subscribe

UiPath Document Understanding

UiPath Document Understanding

Supported Languages

The pre-trained and the supported languages for all Document Understanding ML Packages can be found in the table below.

ML PackageDescriptionPre-trained LanguagesSupported Languages
UiPathDocumentOCRReads document text.- English
- French
- German
- Italian
- Portuguese
- Romanian
- Spanish
- English
- French
- German
- Italian
- Portuguese
- Romanian
- Spanish
DocumentUnderstandingExtracts commonly occurring data points from any type of structured or semi-structured documents, building an ML model from scratch.n/a- Latin-based languages
- Cyrillic languages
- Greek left-to-right
- Japanese (Preview)
- Chinese (Preview)
DocumentClassifier Classifies documents using a custom trained ML model.n/a- Language-agnostic (supports any language as long as the OCR supports it)
InvoicesExtracts commonly occurring data points from invoices.- English
- French
- German
- Portuguese
- Romanian
- Spanish
- Latin-based languages
- Cyrillic languages
- Greek left-to-right
InvoicesAustraliaExtracts commonly occurring data points from Australian invoices.- English- Latin-based languages
- Cyrillic languages
- Greek left-to-right
InvoicesIndiaExtracts commonly occurring data points from Indian invoices.- English- Latin-based languages
- Cyrillic languages
- Greek left-to-right
InvoicesJapanExtracts commonly occurring data points from Japanese invoices.- Japanese (Preview)- Japanese (Preview)
InvoicesChinaExtracts commonly occurring data points from Chinese invoices.- Chinese (Preview)- Chinese (Preview)
ReceiptsExtracts commonly occurring data points from receipts.- English
- Finnish
- French
- German
Norwegian
Romanian
Spanish
Latin-based languages
Cyrillic languages
Greek left-to-right
PurchaseOrdersExtracts commonly occurring data points from purchase orders. English
German
Latin-based languages
Cyrillic languages
Greek left-to-right
UtilityBillsExtracts commonly occurring data points from utility bills. English Latin-based languages
Cyrillic languages
Greek left-to-right
IDCardsExtracts commonly occurring data points from ID cards. Australia
Austria
Belgium
Canada
Croatia
Cyprus
Finland
France
Germany
Hong Kong
Hungary
India
Italy
Netherlands
Poland
Romania
Spain
Switzerland
United Kingdom
USA (all 50 states plus DC)
Latin-based languages
Cyrillic languages
Greek left-to-right
PassportsExtracts commonly occurring data points from passports. International International
RemittanceAdvicesExtracts commonly occurring data points from remittance advices. English Latin-based languages
Cyrillic languages
Greek left-to-right
DeliveryNotesExtracts commonly occurring data points from delivery notes. English
German
Latin-based languages
Cyrillic languages
Greek left-to-right
W2Extracts commonly occurring data points from W-2 forms. English Latin-based languages
Cyrillic languages
Greek left-to-right
W9Extracts commonly occurring data points from W-9 forms. English
Spanish
Latin-based languages
Cyrillic languages
Greek left-to-right
FormExtractor Provides the endpoint required by the Form Extractor activity.n/a Latin-based languages
Cyrillic languages
Greek left-to-right
Asian languages
IntelligentFormExtractor Provides the endpoint required by the Intelligent Form Extractor activity.n/a Latin-based languages
Cyrillic languages
Greek left-to-right
Asian languages
IntelligentKeywordClassifierProvides the endpoint required by the Intelligent Keyword Classifier activity.n/a Latin-based languages
Cyrillic languages
Greek left-to-right
Asian languages
HandwritingRecognitionReads handwritten text. English English

Observations

  • To train a model on Japanese documents, use either the DocumentUnderstanding package or the InvoicesJapan package.
  • To train a model on Chinese documents, use either the DocumentUnderstanding package or the InvoicesChina package.
  • To train a model on Latin script documents, use any package except for InvoicesJapan or InvoicesChina.
  • For the supported languages, retraining may be required to get the expected accuracy if the documents are considerably different from the original model training dataset.
  • For the supported languages which are not pre-trained by the model, you can train a model with your own data in AI Center, assuming the OCR engine supports it as well.
  • Automatic reformatting of dates in a standard yyyy-mm-dd format for Asian languages is currently supported only for Japanese. For documents in other Asian languages, you can extract the dates as String content type and format it in the RPA workflow.

Updated about a month ago


Supported Languages


Suggested Edits are limited on API Reference Pages

You can only suggest edits to Markdown body content, but not to the API spec.