- UiPath.Abbyy.Activities
- UiPath.AbbyyEmbedded.Activities
- UiPath.DocumentUnderstanding.ML.Activities
- UiPath.DocumentUnderstanding.OCR.LocalServer.Activities
- UiPath.IntelligentOCR.Activities
- UiPath.OCR.Activities
- UiPath.OCR.Contracts
- UiPath.DocumentProcessing.Contracts
- UiPath.OmniPage.Activities
- UiPath.PDF.Activities
Supported Languages
The pre-trained and the supported languages for all Document Understanding ML Packages can be found in the table below.
ML Package |
Description |
Pre-trained Languages |
Supported Languages |
---|---|---|---|
UiPathDocumentOCR |
Reads document text. |
English French German Italian Portuguese Romanian Spanish |
English French German Italian Portuguese Romanian Spanish |
DocumentUnderstanding |
Extracts commonly occurring data points from any type of structured or semi-structured documents, building an ML model from scratch. |
n/a |
Latin-based languages Cyrillic languages Greek left-to-right Japanese (Preview) Chinese (Preview) |
DocumentClassifier |
Classifies documents using a custom trained ML model. |
n/a |
Language-agnostic (supports any language as long as the OCR supports it) |
Invoices |
Extracts commonly occurring data points from invoices. |
English French German Portuguese Romanian Spanish |
Latin-based languages Cyrillic languages Greek left-to-right |
InvoicesAustralia |
Extracts commonly occurring data points from Australian invoices. |
English |
Latin-based languages Cyrillic languages Greek left-to-right |
InvoicesIndia |
Extracts commonly occurring data points from Indian invoices. |
English |
Latin-based languages Cyrillic languages Greek left-to-right |
InvoicesJapan |
Extracts commonly occurring data points from Japanese invoices. |
Japanese (Preview) |
Japanese (Preview) |
InvoicesChina |
Extracts commonly occurring data points from Chinese invoices. |
Chinese (Preview) |
Chinese (Preview) |
Receipts |
Extracts commonly occurring data points from receipts. |
English Finnish French German Norwegian Romanian Spanish |
Latin-based languages Cyrillic languages Greek left-to-right |
PurchaseOrders |
Extracts commonly occurring data points from purchase orders. |
English German |
Latin-based languages Cyrillic languages Greek left-to-right |
UtilityBills |
Extracts commonly occurring data points from utility bills. |
English |
Latin-based languages Cyrillic languages Greek left-to-right |
IDCards |
Extracts commonly occurring data points from ID cards. |
Australia Austria Belgium Canada Croatia Cyprus Finland France Germany Hong Kong Hungary India Italy Netherlands Poland Romania Spain Switzerland United Kingdom USA (all 50 states plus DC) |
Latin-based languages Cyrillic languages Greek left-to-right |
Passports |
Extracts commonly occurring data points from passports. |
International |
International |
RemittanceAdvices |
Extracts commonly occurring data points from remittance advices. |
English |
Latin-based languages Cyrillic languages Greek left-to-right |
DeliveryNotes |
Extracts commonly occurring data points from delivery notes. |
English German |
Latin-based languages Cyrillic languages Greek left-to-right |
W2 |
Extracts commonly occurring data points from W-2 forms. |
English |
Latin-based languages Cyrillic languages Greek left-to-right |
W9 |
Extracts commonly occurring data points from W-9 forms. |
English Spanish |
Latin-based languages Cyrillic languages Greek left-to-right |
FormExtractor |
Provides the endpoint required by the Form Extractor activity. |
n/a |
Latin-based languages Cyrillic languages Greek left-to-right Asian languages |
IntelligentFormExtractor |
Provides the endpoint required by the Intelligent Form Extractor activity. |
n/a |
Latin-based languages Cyrillic languages Greek left-to-right Asian languages |
IntelligentKeywordClassifier |
Provides the endpoint required by the Intelligent Keyword Classifier activity. |
n/a |
Latin-based languages Cyrillic languages Greek left-to-right Asian languages |
HandwritingRecognition |
Reads handwritten text. |
English |
English |
Observations
- To train a model on Japanese documents, use either the DocumentUnderstanding package or the InvoicesJapan package.
- To train a model on Chinese documents, use either the DocumentUnderstanding package or the InvoicesChina package.
- To train a model on Latin script documents, use any package except for InvoicesJapan or InvoicesChina.
- For the supported languages, retraining may be required to get the expected accuracy if the documents are considerably different from the original model training dataset.
- For the supported languages which are not pre-trained by the model, you can train a model with your own data in AI Center, assuming the OCR engine supports it as well.
- Automatic reformatting of dates in a standard
yyyy-mm-dd
format for Asian languages is currently supported only for Japanese. For documents in other Asian languages, you can extract the dates as String content type and format it in the RPA workflow.