- About ML Packages
- Hardware Requirements
- Supported Languages
- OCR Configuration
- UiPath.Abbyy.Activities
- UiPath.AbbyyEmbedded.Activities
- UiPath.DocumentUnderstanding.ML.Activities
- UiPath.DocumentUnderstanding.OCR.LocalServer.Activities
- UiPath.IntelligentOCR.Activities
- UiPath.OCR.Activities
- UiPath.OCR.Contracts
- UiPath.DocumentProcessing.Contracts
- UiPath.OmniPage.Activities
- UiPath.PDF.Activities
Supported Languages
The pre-trained and the supported languages for all Document Understanding ML Packages can be found in the table below.
ML Package |
Description |
Pre-trained Languages |
Supported Languages |
---|---|---|---|
UiPathDocumentOCR |
Reads document text. |
|
|
DocumentUnderstanding |
Extracts commonly occurring data points from any type of structured or semi-structured documents, building an ML model from scratch. |
n/a |
|
DocumentClassifier |
Classifies documents using a custom trained ML model. |
n/a |
|
Invoices |
Extracts commonly occurring data points from invoices. |
|
|
InvoicesAustralia |
Extracts commonly occurring data points from Australian invoices. |
|
|
InvoicesIndia |
Extracts commonly occurring data points from Indian invoices. |
|
|
InvoicesJapan |
Extracts commonly occurring data points from Japanese invoices. |
|
|
InvoicesChina |
Extracts commonly occurring data points from Chinese invoices. |
|
|
Receipts |
Extracts commonly occurring data points from receipts. |
|
Latin-based languages Cyrillic languages Greek left-to-right |
PurchaseOrders |
Extracts commonly occurring data points from purchase orders. |
English German |
Latin-based languages Cyrillic languages Greek left-to-right |
UtilityBills |
Extracts commonly occurring data points from utility bills. |
English |
Latin-based languages Cyrillic languages Greek left-to-right |
IDCards |
Extracts commonly occurring data points from ID cards. |
Australia Austria Belgium Canada Croatia Cyprus Finland France Germany Hong Kong Hungary India Italy Netherlands Poland Romania Spain Switzerland United Kingdom USA (all 50 states plus DC) |
Latin-based languages Cyrillic languages Greek left-to-right |
Passports |
Extracts commonly occurring data points from passports. |
International |
International |
RemittanceAdvices |
Extracts commonly occurring data points from remittance advices. |
English |
Latin-based languages Cyrillic languages Greek left-to-right |
DeliveryNotes |
Extracts commonly occurring data points from delivery notes. |
English German |
Latin-based languages Cyrillic languages Greek left-to-right |
W2 |
Extracts commonly occurring data points from W-2 forms. |
English |
Latin-based languages Cyrillic languages Greek left-to-right |
W9 |
Extracts commonly occurring data points from W-9 forms. |
English Spanish |
Latin-based languages Cyrillic languages Greek left-to-right |
FormExtractor |
Provides the endpoint required by the Form Extractor activity. |
n/a |
Latin-based languages Cyrillic languages Greek left-to-right Asian languages |
IntelligentFormExtractor |
Provides the endpoint required by the Intelligent Form Extractor activity. |
n/a |
Latin-based languages Cyrillic languages Greek left-to-right Asian languages |
IntelligentKeywordClassifier |
Provides the endpoint required by the Intelligent Keyword Classifier activity. |
n/a |
Latin-based languages Cyrillic languages Greek left-to-right Asian languages |
HandwritingRecognition |
Reads handwritten text. |
English |
English |
Observations
- To train a model on Japanese documents, use either the DocumentUnderstanding package or the InvoicesJapan package.
- To train a model on Chinese documents, use either the DocumentUnderstanding package or the InvoicesChina package.
- To train a model on Latin script documents, use any package except for InvoicesJapan or InvoicesChina.
- For the supported languages, retraining may be required to get the expected accuracy if the documents are considerably different from the original model training dataset.
- For the supported languages which are not pre-trained by the model, you can train a model with your own data in AI Center, assuming the OCR engine supports it as well.
- Automatic reformatting of dates in a standard
yyyy-mm-dd
format for Asian languages is currently supported only for Japanese. For documents in other Asian languages, you can extract the dates as String content type and format it in the RPA workflow.