Out of the Box Packages > UiPath > UiPath Document Understanding
These are out-of-the-box Machine Learning Models to classify and extract any commonly occurring data points from Semi-structured or Unstructured documents, including regular fields, table columns, and classification fields in a template-less approach.
Document Understanding contains multiple ML Packages split into 4 main categories:
- UiPath Document OCR
- Document Understanding
- Out of the box Pre-Trained ML Models
- Other Out of the Box Packages
This is a non-retrainable model which can be used with the UiPath Document OCR Engine Activity as part of the Digitize Document Activity. In order to be used, the ML Skill must first be made public so that a URL can be copy-pasted into the UiPath Document OCR Engine activity.
UiPath Document OCR will require access to the Document Understanding metering server at https://du.uipath.com/metering if the ML skill is running on an AI Center on premises regular deployment. No internet access is needed on AI Center on premises airgapped deployments.
This is a generic, retrainable model for extracting any commonly occurring data points from any type of structured or semi-structured documents, building a model from scratch. This ML Package must be trained. If deployed without training first, deployment will fail with an error stating that the model is not trained.
These are retrainable ML Packages that hold the knowledge of different Machine Learning Models.
They can be customized to extract additional fields or support additional languages using Pipeline runs. Using state-of-the-art transfer learning capabilities, this model can be retrained on additional labelled documents and tailored to specific use-case or expanded for additional Latin, Cyrillic or Greek languages support.
The dataset used may have the same fields, a subset of the fields, or have additional fields. In order to benefit from the intelligence already contained in the pre-trained model you need to use fields with the same names as in the OOB model itself.
These ML Packages are:
- Invoices: The fields extracted out-of-the-box can be found here.
- Receipts: The fields extracted out-of-the-box can be found here.
- Purchase Orders (Preview): The fields extracted out-of-the-box can be found here.
- Utility Bills (Preview): The fields extracted out-of-the-box can be found here.
- Invoices India (Preview): The fields extracted out-of-the-box can be found here.
- Invoices Australia (Preview): The fields extracted out-of-the-box can be found here.
- Invoices Japan (Preview): The fields extracted out-of-the-box can be found here.
These models are deep learning architectures built by UiPath. A GPU can be used both at serving time and training time but is not mandatory. A GPU delivers>10x improvement in speed for Training in particular.
These are non-retrainable Packages that are required for non-ML components of the Document Understanding suite.
These ML Packages are:
- Form Extractor (FE): Deploy as Public Skill and paste the URL into the Form Extractor activity
- Intelligent Form Extractor (IFE): Deploy as Public Skill and paste the URL into the Intelligent Form Extractor activity. Make sure to first deploy the Handwriting OCR Skill and configure that as OCR for the IFE package.
- Intelligent Keyword Classifier (IKC): Deploy as Public Skill and paste the URL into the Intelligent Keyword Classifier activity
- Handwriting OCR: Deploy as Public Skill and use as OCR when creating the IFE Package.
Intelligent Form Extractor
Intelligent Keyword Classifier
file (accepted formats: pdf, png, bmp, jpeg, tiff)
The file needs to be digitized and the input will be in Data Extraction Scope activity. More details here.
.json file with all fields extracted from the Machine Learning model.
The output will be configured in Data Extraction Scope, stored in
ExtractionResults variable. This result can be transformed into a DataSet type using Export Extraction Results activity. More details here.
Updated 8 days ago