Document Understanding Activities

Last updated Dec 5, 2024

About the IntelligentOCR activity package

UiPath.IntelligentOCR.Activities contains the infrastructure for enabling document processing flows using a complete, open, extensible approach.

Replacing removed versions

The following table shows the package versions that were removed, and the recommended version to use, instead.

Table 1. The removed versions and their recommended replacements
	Recommended version
4.3.0-preview \| 4.4.0-preview	4.5.2
2.1.0 \| 2.2.0 \| 2.3.0	4.0.1
1.4.0 \| 1.5.0 \| 1.6.0 \| 1.6.1 \| 2.0.0 \| 2.0.1	2.0.2
1.2.0 \| 1.2.1 \| 1.3.0	1.3.2

Important

Starting with the v6.19.0 release, when installing the UiPath.IntelligentOCR.Activities package in a project, the UiPath.DocumentUnderstanding.ML.Activities package is automatically installed as well and you do not need to install it separately.
If you are using UiPath® Studio 2023.4.4 or earlier, make sure to install the latest version of Windows .NET 6.0 Desktop Runtime.

Version compatibility

Updating the UiPath.IntelligentOCR.Activities also requires an update for the UiPath.UIAutomation.Activities package and for UiPath.OCR.Activities package if included in the project.

UiPath.IntelligentOCR.Activities and UiPath.DocumentUnderstanding.Activities should not be used together in the same project. The UiPath.IntelligentOCR.Activities package should be used for Windows (or Legacy) workflows, while the UiPath.DocumentUnderstanding.Activities package should be used for Cross-platform workflows.

Supported formats

The IntelligentOCR activity package can support any of the following file types: .png, .gif, .jpe, .jpg, .jpeg, .tiff, .tif, .bmp, and .pdf.

Support for C# project language

Starting with version 4.10.0, this activity package is validated for use in C# projects.

Functionalities

This section shows the multiple functionalities of the Intelligent.OCR package.

Digitize documents

You can achieve this using the Digitize Document activity. This retrieves the text from any PDF or image, using, only if necessary, the OCR engine of your choice.

As the documents are processed one by one, they go through the digitization process. The difference for non-digital (scanned) documents is that you need to apply the OCR engine of your choice. The outputs of this step are the Document Object Model and a string variable containing all the document text and are passed down to the next steps.

Classify documents

You can achieve this using the Classify Document activity. This allows identifying what type of document a file is by using any classification algorithm.

After digitization, the document is classified. If you are working with multiple documents types in the same project, to extract data properly you need to know what type of document you're working with. The important thing is that you can use multiple classifiers in the same scope, you can configure the classifiers and, later in the framework, train them. The classification results help in applying the right strategy in extraction.

The following list shows the available classifiers:

The Keyword Based Classifier activity is the first such classifier, targeting classification for titled documents.
The Intelligent Keyword Classifier activity can not only classify but also "split" files that contain multiple document types within them.
The Machine Learning Classifier activity can classify your files using a powerful ML Model, that you can train according to your needs.
The Generative Classifier activity allows you to classify documents using generative models.

Validate automatic classification

You can achieve this using the Present Classification Station attended activity, which presents a document processing specific user interface for validating and correcting automatic classification outputs.

Especially for use cases in which file splitting is involved, using the human classification validation step is strongly recommended, to ensure that downstream processing for data extraction works properly.

An alternative to the attended activity is available through the usage of Long-Running Workflows, which are designed to optimally enable human-robot collaboration. The Create Document Classification Action and the Wait For Document Classification Action And Resume activities enable this scenario.

Train classifiers

You can achieve this using the Train Classifiers Scope activity. This empowers the closing of the feedback loop to any classification algorithm capable of learning. Drag and drop your classifier trainers within this Scope activity and enable them using the Configure Classifiers wizard to ensure that the information validated by humans through the Classification Station or Validation Station is used by your classifiers to improve their own performance.

Classification is as efficient as the classifiers used are. If a document wasn’t classified properly, it means it was unknown to the active classifiers. The Framework provides the opportunity to train the classifiers, to improve recognition of the document classes.

The following is a list of the available classifier trainers:

The Keyword Based Classifier Trainer is the trainer activity paired with the Keyword Based Classifier.
The Intelligent Keyword Classifier Trainer enables the feedback loop for Intelligent Keyword Classifier.
The Machine Learning Classifier Trainer is the trainer activity paired with the Machine Learning Classifier.

Extract data from documents

You can achieve this using the Data Extraction Scope activity. This allows the usage of any data extraction algorithm for identifying different fields in a classified document.

Extraction is getting just the data you are interested in from a given document type. For example, extracting specific data from a 5-page document is quite troublesome if you want to do it with string manipulation. In this framework, you can use different extractors, for the different document structures, in the same data extraction scope. The extraction results are passed further for validation.

The following is a list of available extractors:

The RegEx Based Extractor is a basic data extractor that applies regular expression matching to identify the best candidates for a specific field.
The Form Extractor uses predefined templates to enable the processing of structured, fixed form documents.
The Machine Learning Extractor leverages the power of AI and Machine Learning to identify information in structured or semi-structured documents by either using one of UiPath®'s public data extraction services or by calling custom trained Machine Learning models that you can build and host in AI Center. This activity is part of the UiPath.DocumentUnderstanding.ML.Activities package.
The Generative Extractor allows you to extract documents using generative models. This activity is part of the UiPath.DocumentUnderstanding.ML.Activities package.

Validate automatic data extraction results

You can achieve this using the Present Validation Station attended activity, which presents a document processing specific user interface for data validation and correction.

The extracted data can be validated by a human user through the Validation Station. A best practice is to build logic around the decision of adding or not a human validation step, with rules depending on the specific use case to be implemented. Validation results can then be exported and used in further automation activities.
You can also enable human validation through long-running workflows, optimizing human-robot collaboration, using the Create Document Validation Action and Wait for Document Validation Action and Resume activities.

Train extractors

You can achieve this using the Train Extractors Scope activity. This empowers the closing of the feedback loop to any data extraction algorithm capable of learning. Drag and drop your extractor trainers within this Scope activity and enable them using the Configure Extractors wizard to ensure that the information validated by humans through the Validation Station is used by your extractors to improve their own performance.

Extraction is efficient as the extractors used are. If field values were not extracted properly, it means they were unknown to the active extractors. The Framework provides the opportunity to train the extractors, to improve recognition of field values.

The Machine Learning Extractor Trainer closes the feedback loop for ML-based data extraction, by collecting the data required for retraining a Machine Learning model hosted in AI Center. This activity is the companion of Machine Learning Extractor and is part of the UiPath.DocumentUnderstanding.ML.Activities package.

Export extracted information

You can achieve this using the Export Extraction Results activity. This allows you to export the complex structure of extracted data to a simple DataSet (collection of DataTables).

Once you have your validated information, you can use it as it is, or save it in a DataTable format that can be converted very easily into an Excel file.

The UiPath.IntelligentOCR.Activities package is compatible with any custom classification or data extraction activity that is built based on the public UiPath.DocumentProcessing.Contracts package. It offers full flexibility to build your own algorithm specific to your use case, as well as integrating it with any third-party solution for document classification and data extraction.

The following versions of the package have been removed from the official feed. Should you have any issues, please reach out to our support teams.

On this page