The pre-trained and the supported languages for all Document Understanding ML Packages can be found in the table below.
Observations
- To train a model on Japanese documents, use either the DocumentUnderstanding package or the InvoicesJapan package.
- To train a model on Chinese documents, use either the DocumentUnderstanding package or the InvoicesChina package.
- To train a model on Latin script documents, use any package except for InvoicesJapan or InvoicesChina.
- For the supported languages, retraining may be required to get the expected accuracy if the documents are considerably different from the original model training dataset.
- For the supported languages which are not pre-trained by the model, you can train a model with your own data in AI Center, assuming the OCR engine supports it as well.
- Automatic reformatting of dates in a standard
yyyy-mm-dd
format for Asian languages is currently supported only for Japanese. For documents in other Asian languages, you can extract the dates as String content type and format it in the RPA workflow.
ML Package | Description | Pretrained/Supported Languages | GPU/Hardware Recommendations |
---|---|---|---|
DocumentUnderstanding | Extracts commonly occurring data points from any type of structured or semi-structured documents, building an ML model from scratch. - - This is not a pre-trained model. | Pretrained for: - N/A Supported languages: - Languages using Latin alphabet (A, B, C, etc) - Languages using Cyrillic alphabet (А, Б, В, etc.) Greek left-to-right alphabet (A, B, Γ, etc.) Japanese (Preview) Chinese (Preview) | GPU recommended (not mandatory). If used, NVIDIA driver should be R418.40.04, R450.36.06, or a higher version. |
Invoices | Extracts commonly occurring data points from invoices and credit notes. | Pretrained for: English French German Portuguese Romanian Spanish Can be trained for: Languages using Latin alphabet (A, B, C, etc) Languages using Cyrillic alphabet (А, Б, В, etc.) Greek left-to-right alphabet (A, B, Γ, etc.) | GPU recommended (not mandatory). If used, NVIDIA driver should be R418.40.04, R450.36.06, or a higher version. |
InvoicesAustralia | Extracts commonly occurring data points from Australian invoices. | Pretrained for: English French German Portuguese Romanian Spanish Can be trained for: Languages using Latin alphabet (A, B, C, etc) Languages using Cyrillic alphabet (А, Б, В, etc.) Greek left-to-right alphabet (A, B, Γ, etc.) | GPU recommended (not mandatory). If used, NVIDIA driver should be R418.40.04, R450.36.06, or a higher version. |
InvoicesIndia | Extracts commonly occurring data points from Indian invoices. | Pretrained for: English French German Portuguese Romanian Spanish Can be trained for: Languages using Latin alphabet (A, B, C, etc) Languages using Cyrillic alphabet (А, Б, В, etc.) Greek left-to-right alphabet (A, B, Γ, etc.) | GPU recommended (not mandatory). If used, NVIDIA driver should be R418.40.04, R450.36.06, or a higher version. |
InvoicesJapan | Extracts commonly occurring data points from Japanese invoices. | Pretrained for: Japanese | GPU recommended (not mandatory). If used, NVIDIA driver should be R418.40.04, R450.36.06, or a higher version. |
InvoicesChina | Extracts commonly occurring data points from Chinese invoices. | Pretrained for: Chinese | GPU recommended (not mandatory). If used, NVIDIA driver should be R418.40.04, R450.36.06, or a higher version. |
Receipts | Extracts commonly occurring data points from receipts. | Pretrained for: English Finnish French German Norwegian Romanian Spanish Can be trained for: Languages using Latin alphabet (A, B, C, etc) Languages using Cyrillic alphabet (А, Б, В, etc.) Greek left-to-right alphabet (A, B, Γ, etc.) | GPU recommended (not mandatory). If used, NVIDIA driver should be R418.40.04, R450.36.06, or a higher version. |
Purchase Orders | Extracts commonly occurring data points from purchase orders. | Pretrained for: English German Can be trained for: Languages using Latin alphabet (A, B, C, etc) Languages using Cyrillic alphabet (А, Б, В, etc.) Greek left-to-right alphabet (A, B, Γ, etc.) | GPU recommended (not mandatory). If used, NVIDIA driver should be R418.40.04, R450.36.06, or a higher version. |
Utility Bills | Extracts commonly occurring data points from utility bills. | Pretrained for: English Can be trained for: Languages using Latin alphabet (A, B, C, etc) Languages using Cyrillic alphabet (А, Б, В, etc.) Greek left-to-right alphabet (A, B, Γ, etc.) | GPU recommended (not mandatory). If used, NVIDIA driver should be R418.40.04, R450.36.06, or a higher version. |
ID Cards | Extracts commonly occurring data points from ID cards and driver's licenses. | Pretrained for: Australia Austria Belgium Canada Croatia Cyprus Finland France Germany Hong Kong Hungary India Italy Netherlands Poland Romania Spain Switzerland United Kingdom USA (all 50 states plus DC) Can be trained for: Languages using Latin alphabet (A, B, C, etc) Languages using Cyrillic alphabet (А, Б, В, etc.) Greek left-to-right alphabet (A, B, Γ, etc.) | GPU recommended (not mandatory). If used, NVIDIA driver should be R418.40.04, R450.36.06, or a higher version. |
Passports | Extracts commonly occurring data points from passports. | Pretrained for: International | GPU recommended (not mandatory). If used, NVIDIA driver should be R418.40.04, R450.36.06, or a higher version. |
RemittanceAdvices | Extracts commonly occurring data points from remittance advices. | Pretrained for: English Can be trained for: Languages using Latin alphabet (A, B, C, etc) Languages using Cyrillic alphabet (А, Б, В, etc.) Greek left-to-right alphabet (A, B, Γ, etc.) | GPU recommended (not mandatory). If used, NVIDIA driver should be R418.40.04, R450.36.06, or a higher version. |
BillsOfLading | Extracts commonly occurring data points from delivery notes, bills of lading, and sea or air waybills. | Pretrained for: English German Can be trained for: Languages using Latin alphabet (A, B, C, etc) Languages using Cyrillic alphabet (А, Б, В, etc.) Greek left-to-right alphabet (A, B, Γ, etc.) | GPU recommended (not mandatory). If used, NVIDIA driver should be R418.40.04, R450.36.06, or a higher version. |
W2 | Extracts commonly occurring data points from W-2 forms. | Pretrained for: English Can be trained for: Languages using Latin alphabet (A, B, C, etc) Languages using Cyrillic alphabet (А, Б, В, etc.) Greek left-to-right alphabet (A, B, Γ, etc.) | GPU recommended (not mandatory). If used, NVIDIA driver should be R418.40.04, R450.36.06, or a higher version. |
W9 | Extracts commonly occurring data points from W-9 forms. | Pretrained for: English Spanish Can be trained for: Languages using Latin alphabet (A, B, C, etc) Languages using Cyrillic alphabet (А, Б, В, etc.) Greek left-to-right alphabet (A, B, Γ, etc.) | GPU recommended (not mandatory). If used, NVIDIA driver should be R418.40.04, R450.36.06, or a higher version. |
ACORD125 | Extracts commonly occurring data points from ACORD125 forms. | Pretrained for: English Can be trained for: Languages using Latin alphabet (A, B, C, etc) Languages using Cyrillic alphabet (А, Б, В, etc.) Greek left-to-right alphabet (A, B, Γ, etc.) | GPU recommended (not mandatory). If used, NVIDIA driver should be R418.40.04, R450.36.06, or a higher version. |
I9 | Extracts commonly occurring data points from I9 forms. | Pretrained for: English Can be trained for: Languages using Latin alphabet (A, B, C, etc) Languages using Cyrillic alphabet (А, Б, В, etc.) Greek left-to-right alphabet (A, B, Γ, etc.) | GPU recommended (not mandatory). If used, NVIDIA driver should be R418.40.04, R450.36.06, or a higher version. |
990 | Extracts commonly occurring data points from 990 forms. | Pretrained for: English Finnish French German Norwegian Romanian Spanish Can be trained for: Languages using Latin alphabet (A, B, C, etc) Languages using Cyrillic alphabet (А, Б, В, etc.) Greek left-to-right alphabet (A, B, Γ, etc.) | GPU recommended (not mandatory). If used, NVIDIA driver should be R418.40.04, R450.36.06, or a higher version. |
4506T | Extracts commonly occurring data points from 4506-T forms. | Pretrained for: English Can be trained for: Languages using Latin alphabet (A, B, C, etc) Languages using Cyrillic alphabet (А, Б, В, etc.) Greek left-to-right alphabet (A, B, Γ, etc.) | GPU recommended (not mandatory). If used, NVIDIA driver should be R418.40.04, R450.36.06, or a higher version. |
FM1003 | Extracts commonly occurring data points from FM1003 Loan Application forms. | Pretrained for: English Can be trained for: Languages using Latin alphabet (A, B, C, etc) Languages using Cyrillic alphabet (А, Б, В, etc.) Greek left-to-right alphabet (A, B, Γ, etc.) | GPU recommended (not mandatory). If used, NVIDIA driver should be R418.40.04, R450.36.06, or a higher version. |
ACORD25 | Extracts commonly occurring data points from ACORD25 forms. | Pretrained for: English Can be trained for: Languages using Latin alphabet (A, B, C, etc) Languages using Cyrillic alphabet (А, Б, В, etc.) Greek left-to-right alphabet (A, B, Γ, etc.) | GPU recommended (not mandatory). If used, NVIDIA driver should be R418.40.04, R450.36.06, or a higher version. |
1040 | Extracts commonly occurring data points from 1040 forms. | Pretrained for: English Can be trained for: Languages using Latin alphabet (A, B, C, etc) Languages using Cyrillic alphabet (А, Б, В, etc.) Greek left-to-right alphabet (A, B, Γ, etc.) | GPU recommended (not mandatory). If used, NVIDIA driver should be R418.40.04, R450.36.06, or a higher version. |
Checks | Extracts commonly occurring data points from Checks forms. | Pretrained for: English Can be trained for: Languages using Latin alphabet (A, B, C, etc) Languages using Cyrillic alphabet (А, Б, В, etc.) Greek left-to-right alphabet (A, B, Γ, etc.) | GPU recommended (not mandatory). If used, NVIDIA driver should be R418.40.04, R450.36.06, or a higher version. |
Bank Statements | Extracts commonly occurring data points from Bank Statements forms. | Pretrained for: English Can be trained for: Languages using Latin alphabet (A, B, C, etc) Languages using Cyrillic alphabet (А, Б, В, etc.) Greek left-to-right alphabet (A, B, Γ, etc.) | GPU recommended (not mandatory). If used, NVIDIA driver should be R418.40.04, R450.36.06, or a higher version. |
Financial statements | Extracts commonly occurring data points from Financial Statements forms. | Pretrained for: English Can be trained for: Languages using Latin alphabet (A, B, C, etc) Languages using Cyrillic alphabet (А, Б, В, etc.) Greek left-to-right alphabet (A, B, Γ, etc.) | GPU recommended (not mandatory). If used, NVIDIA driver should be R418.40.04, R450.36.06, or a higher version. |
Packing Lists | Extracts commonly occurring data points from Packing Lists forms. | Pretrained for: English Can be trained for: Languages using Latin alphabet (A, B, C, etc) Languages using Cyrillic alphabet (А, Б, В, etc.) Greek left-to-right alphabet (A, B, Γ, etc.) | GPU recommended (not mandatory). If used, NVIDIA driver should be R418.40.04, R450.36.06, or a higher version. |
ACORD131 | Extracts commonly occurring data points from Acord131 forms. | Pretrained for: English Can be trained for: Languages using Latin alphabet (A, B, C, etc) Languages using Cyrillic alphabet (А, Б, В, etc.) Greek left-to-right alphabet (A, B, Γ, etc.) | GPU recommended (not mandatory). If used, NVIDIA driver should be R418.40.04, R450.36.06, or a higher version. |
ACORD126 | Extracts commonly occurring data points from Acord126 forms. | Pretrained for: English Can be trained for: Languages using Latin alphabet (A, B, C, etc) Languages using Cyrillic alphabet (А, Б, В, etc.) Greek left-to-right alphabet (A, B, Γ, etc.) | GPU recommended (not mandatory). If used, NVIDIA driver should be R418.40.04, R450.36.06, or a higher version. |
ACORD140 | Extracts commonly occurring data points from Acord140 forms. | Pretrained for: English Can be trained for: Languages using Latin alphabet (A, B, C, etc) Languages using Cyrillic alphabet (А, Б, В, etc.) Greek left-to-right alphabet (A, B, Γ, etc.) | GPU recommended (not mandatory). If used, NVIDIA driver should be R418.40.04, R450.36.06, or a higher version. |
Vehicle Titles | Extracts commonly occurring data points from Vehicle Titles forms. | Pretrained for: English Can be trained for: Languages using Latin alphabet (A, B, C, etc) Languages using Cyrillic alphabet (А, Б, В, etc.) Greek left-to-right alphabet (A, B, Γ, etc.) | GPU recommended (not mandatory). If used, NVIDIA driver should be R418.40.04, R450.36.06, or a higher version. |
UiPathDocumentOCR | Reads document text. This is a non-trainable model. | Supported languages: Albanian Asturian Basque Bislama Breton Catalan Cebuano Czech Croatian Cornish Danish Dutch English Estonian Fijian Filipino Finish French - print and handwriting language support Friulian Gagauz Galician German - print and handwriting language support Gilbertese Hani Hungarian Hmong Daw Indonesian Interlingua Irish Italian Javanese Kabuverdianu Kachin Khasi Latin Latvian Lituanian Luxembourgish Malay Neapolitan Norwegian (print language support) Occitan Polish Portuguese Ripuarian Romanian Romansh Scots Serbian Slovakian Slovenian Spanish Swahili Swedish Tetum Tongan Turkish Uzbek Volapük Welsh Yucatec Maya Zulu | GPU recommended (not mandatory). If used, NVIDIA driver should be R418.40.04, R450.36.06, or a higher version. |
UiPathDocumentOCR_CPU | Reads document text and is optimized to run on CPU. This is a non-trainable model. | Supported languages: Albanian Asturian Basque Bislama Breton Catalan Cebuano Czech Croatian Cornish Danish Dutch English Estonian Fijian Filipino Finish French - print and handwriting language support Friulian Gagauz Galician German - print and handwriting language support Gilbertese Hani Hungarian Hmong Daw Indonesian Interlingua Irish Italian Javanese Kabuverdianu Kachin Khasi Latin Latvian Lituanian Luxembourgish Malay Neapolitan Norwegian (print language support) Occitan Polish Portuguese Ripuarian Romanian Romansh Scots Serbian Slovakian Slovenian Spanish Swahili Swedish Tetum Tongan Turkish Uzbek Volapük Welsh Yucatec Maya Zulu | CPU Mandatory, cannot be deployed on GPU. |
OCR for Chinese, Japanese, Korean | Reads document text. This is a non-trainable model. | Supported languages: Chinese (traditional and simplified) - print and handwriting language support Japanese - print and handwriting language support Korean - print and handwriting language support | CPU Mandatory, cannot be deployed on GPU. |
DocumentClassifier | Classifies documents using a custom trained ML model. This is a non-trainable model. | Supported languages: Language agnostic (supports any language as long as the OCR supports it) | GPU recommended (not mandatory). If used, NVIDIA driver should be R418.40.04, R450.36.06, or a higher version. |
FormExtractor | Provides the endpoint required by the Form Extractor activity. This is a non-trainable model, available for On Premises only. | Supported languages: Languages using Latin alphabet (A, B, C, etc) Languages using Cyrillic alphabet (А, Б, В, etc.) Greek left-to-right alphabet (A, B, Γ, etc.) | GPU recommended (not mandatory). If used, NVIDIA driver should be R418.40.04, R450.36.06, or a higher version. |
IntelligentFormExtractor(deprecated) | Provides the endpoint required by the Intelligent Form Extractor activity. This is a non-trainable model, available for On Premises only. | Supported languages: Languages using Latin alphabet (A, B, C, etc) Languages using Cyrillic alphabet (А, Б, В, etc.) Greek left-to-right alphabet (A, B, Γ, etc.) | GPU recommended (not mandatory). If used, NVIDIA driver should be R418.40.04, R450.36.06, or a higher version. |
IntelligentKeywordClassifier | Provides the endpoint required by the Intelligent Keyword Classifier activity. This is a non-trainable model, available for On Premises only. | Supported languages: Languages using Latin alphabet (A, B, C, etc) Languages using Cyrillic alphabet (А, Б, В, etc.) Greek left-to-right alphabet (A, B, Γ, etc.) | GPU recommended (not mandatory). If used, NVIDIA driver should be R418.40.04, R450.36.06, or a higher version. |
HandwritingRecognitionOCR | Reads handwritten text. This is a non-trainable model, available for On Premises only. | Supported languages: English | GPU recommended (not mandatory). If used, NVIDIA driver should be R418.40.04, R450.36.06, or a higher version. |
Updated 4 days ago