The pre-trained and the supported languages for all Document Understanding ML Packages can be found in the table below.
ML Package | Description | Pre-trained Languages | Supported Languages |
---|---|---|---|
UiPathDocumentOCR | Reads document text. | English | English |
DocumentUnderstanding | Extracts commonly occurring data points from any type of structured or semi-structured documents, building an ML model from scratch. | n/a | Latin-based languages |
DocumentClassifier | Classifies documents using a custom trained ML model. | n/a | Language-agnostic (supports any language as long as the OCR supports it) |
Invoices | Extracts commonly occurring data points from invoices. | English | Latin-based languages |
InvoicesAustralia | Extracts commonly occurring data points from Australian invoices. | English | Latin-based languages |
InvoicesIndia | Extracts commonly occurring data points from Indian invoices. | English | Latin-based languages |
InvoicesJapan | Extracts commonly occurring data points from Japanese invoices. | Japanese (Preview) | Japanese (Preview) |
InvoicesChina | Extracts commonly occurring data points from Chinese invoices. | Chinese (Preview) | Chinese (Preview) |
Receipts | Extracts commonly occurring data points from receipts. | English | Latin-based languages |
PurchaseOrders | Extracts commonly occurring data points from purchase orders. | English | Latin-based languages |
UtilityBills | Extracts commonly occurring data points from utility bills. | English | Latin-based languages |
IDCards | Extracts commonly occurring data points from ID cards. | Australia | Latin-based languages |
Passports | Extracts commonly occurring data points from passports. | International | International |
RemittanceAdvices | Extracts commonly occurring data points from remittance advices. | English | Latin-based languages |
DeliveryNotes | Extracts commonly occurring data points from delivery notes. | English | Latin-based languages |
W2 | Extracts commonly occurring data points from W-2 forms. | English | Latin-based languages |
W9 | Extracts commonly occurring data points from W-9 forms. | English | Latin-based languages |
ACORD125 | Extracts commonly occurring data points from ACORD125 forms. | English | Latin-based languages |
I9 | Extracts commonly occurring data points from I9 forms. | English | Latin-based languages |
990 | Extracts commonly occurring data points from 990 forms. | English | Latin-based languages |
4506T | Extracts commonly occurring data points from 4506-T forms. | English | Latin-based languages |
FM1003 | Extracts commonly occurring data points from FM1003 Loan Application forms. | English | Latin-based languages |
FormExtractor | Provides the endpoint required by the Form Extractor activity. | n/a | Latin-based languages |
IntelligentFormExtractor | Provides the endpoint required by the Intelligent Form Extractor activity. | n/a | Latin-based languages |
IntelligentKeywordClassifier | Provides the endpoint required by the Intelligent Keyword Classifier activity. | n/a | Latin-based languages |
HandwritingRecognition | Reads handwritten text. | English | English |
Observations
- To train a model on Japanese documents, use either the DocumentUnderstanding package or the InvoicesJapan package.
- To train a model on Chinese documents, use either the DocumentUnderstanding package or the InvoicesChina package.
- To train a model on Latin script documents, use any package except for InvoicesJapan or InvoicesChina.
- For the supported languages, retraining may be required to get the expected accuracy if the documents are considerably different from the original model training dataset.
- For the supported languages which are not pre-trained by the model, you can train a model with your own data in AI Center, assuming the OCR engine supports it as well.
- Automatic reformatting of dates in a standard
yyyy-mm-dd
format for Asian languages is currently supported only for Japanese. For documents in other Asian languages, you can extract the dates as String content type and format it in the RPA workflow.
Updated 9 days ago