UiPath Activities

The UiPath Activities Guide

About the IntelligentOCR Activities Pack

This pack contains the infrastructure for enabling document processing flows using a complete, open, extensible approach.

Note:

If an error mentioning the Docotic.Pdf library is encountered at runtime, then you should upgrade the UiPath.IntelligentOCR.Activities package to version v3.1.0 or higher.

Note:

Starting with UiPath.IntelligentOCR.Activities v4.0.0, all Abbyy related activities have been moved to a separate package. Install the UiPath.Abbyy.Activities package if you want to use its activities for OCR, Cloud OCR, classification, and data extraction.

It allows you to:

  • Digitize documents, using the Digitize Document activity. This retrieves the text from any PDF or image, using, only if necessary, the OCR engine of your choice.

    • As the documents are processed one by one, they go through the digitization process. The difference for non-digital (scanned) documents is that you need to apply the OCR engine of your choice. The outputs of this step are the Document Object Model and a string variable containing all the document text and are passed down to the next steps.
  • Classify documents, using the Classify Document Scope activity. This allows identifying what type of document a file is by using any classification algorithm. The Keyword Based Classifier activity is the first such classifier, targeting classification for titled documents. The FlexiCapture Classifier, embedding the Abbyy FlexiCapture technology is also incorporated into our product.

    • After digitization, the document is classified. If you are working with multiple documents types in the same project, to extract data properly you need to know what type of document you're working with. The important thing is that you can use multiple classifiers in the same scope, you can configure the classifiers and, later in the framework, train them. The classification results help in applying the right strategy in extraction.
  • Train classifiers, using the Train Classifiers Scope activity. This empowers the closing of the feedback loop to any classification algorithm capable of learning (the Keyword Based Classifier for example). Drag and drop your classifier trainers within this Scope activity and enable them using the Configure Classifiers wizard to ensure that the information validated by humans through the Validation Station is used by your classifiers to improve their own performance.

    • Classification is as efficient as the classifiers used are. If a document wasn’t classified properly, it means it was unknown to the active classifiers. The Framework provides the opportunity to train the classifiers, to improve recognition of the document classes.
  • Extract data from documents, using the Data Extraction Scope activity. This allows the usage of any data extraction algorithm for identifying different fields in a classified document. The FlexiCapture Extractor is one such example, incorporating the Abbyy FlexiCapture technology into our product. The Regex Based Extractor is another example of a basic data extractor that applies regular expression matching to identify the best candidates for a required value. The Form Extractor and Intelligent Form Extractor are other extraction methods available, focused on processing fixed form documents.

    • Extraction is getting just the data you are interested in. For example, extracting specific data from a 5-page document is quite troublesome if you want to do it with string manipulation. In this framework, you can use different extractors, for the different document structures, in the same scope application. The extraction results are passed further for validation.
  • Train extractors, using the Train Extractors Scope activity. This empowers the closing of the feedback loop to any data extraction algorithm capable of learning. Drag and drop your extractor trainers within this Scope activity and enable them using the Configure Extractors wizard to ensure that the information validated by humans through the Validation Station is used by your extractors to improve their own performance.

    • Extraction is efficient as the extractors used are. If field values were not extracted properly, it means they were unknown to the active extractors. The Framework provides the opportunity to train the extractors, to improve recognition of field values.
  • Validate automatic classification and data extraction, using the Present Validation Station attended activity, which presents a document processing specific user interface for data validation and correction.

    • The extracted data can be validated by a human user through the Validation Station. A best practice is to build logic around the decision of adding or not a human validation step, with rules depending on the specific use case to be implemented. Validation results can then be exported and used in further automation activities.
  • Export extracted information, using the Export Extraction Results activity. This allows you to export the complex structure of extracted data to a simple DataSet (collection of DataTables).

    • Once you have your validated information, you can use it as it is, or save it in a DataTable format that can be converted very easily into an Excel file.

Note:

If you want to use the UiPath.IntelligentOCR.Activities package in the same project with the UiPath.PDF.Activities package, you need to use either version 2.x of both, or versions 3.x of both.
UiPath.IntelligentOCR.Activities version 3.0 and higher is incompatible with a UiPath.PDF.Activities version lower than 3.0, and a UiPath.PDF.Activities version 3.0 or higher is incompatible with an UiPath.IntelligentOCR.Activities version lower than 3.0.

The IntelligentOCR package is compatible with any custom classification or data extraction activity that is built based on the public package UiPath.DocumentProcessing.Contracts. It offers full flexibility to build your own algorithm specific to your use case, as well as integrating it with any third-party solution for document classification and data extraction.

Note:

ABBYY FlexiCapture Engine SDK is required if you want to use the FlexiCapture Classifier or FlexiCapture Extractor activities. The engine only works with an Abbyy FlexiCapture Engine StandAlone 12 license distributed by the UiPath sales department. To request a license, access the Contact us page, go to Technical Support & Activations, fill the form and choose Service Request after providing a Name and Email.

Updated 19 days ago


About the IntelligentOCR Activities Pack


Suggested Edits are limited on API Reference Pages

You can only suggest edits to Markdown body content, but not to the API spec.