UiPath Activities

The UiPath Activities Guide

Form Extractor

UiPath.IntelligentOCR.Activities.DataExtraction.FormExtractor

Note:

Due to licensing purposes, the Form Extractor activity requires an Internet connection to run the robot.

Extracts, matches, and reports the required information by taking into consideration the words' position inside the document. This activity can be used only together with the Data Extraction Scope activity.

Note:

Here you can check the -Preview versions of this activity.

Note:

Some limitations are in place for the Community package versions:
The size of the documents is limited at 2 pages.
Community endpoints are rate-limited per IP address at 50 requests per hour, count totally independent of the other services (Machine Learning, Intelligent Form Extractor). If the rate-limit is reached, an 429 - Too Many Requests error is displayed, and the IP address is blocked for 1 hour.

Properties

Common

  • DisplayName - The display name of the activity.

Input

  • ApiKey - Specifies the API key of the account.
  • Endpoint - The URL to UiPath server.
  • MinOverlapPercentage - Specifies the minimum overlap area (in percentage) between a box in the document and a box in the template required to make an extraction. The percentage value can be set between 0 and 100. The default value is 65.

Misc

  • Private - If selected, the values of variables and arguments are no longer logged at Verbose level.

Note:

Multiple templates can be defined for one Document Type. When the activity is run, the extractor selects the best matching template based on the information found on the first page.

Capabilities and Usability

As all extractors, the Form Extractor works within the Data Extraction Scope activity. You can, if required, add multiple Form Extractor instances in the same scope, and you can combine the Form Extractor with any other extraction method to obtain the best results.

The Form Extractor activity can be used for extracting information from documents that have a fixed format (e.g.: IRS Forms, Bank Forms, Insurance Forms, etc).
The activity allows you to define one or more templates for any of the document types in your taxonomy, using the Template Manager wizard accessible from the Configure Templates link in the activity.
Once the templates are configured, the fields need to be activated for data extraction using the Configure Extractors wizard of the Data Extraction Scope parent activity.

At run-time, the activity is called by the Data Extraction Scope if any fields were previously selected to be processed using a particular instance of the Form Extractor.
The activity checks the template definitions (one or more of the incoming document type) against the document to be processed: it searches for all the keywords defined in the templates as Page 1 Matching Info and decides if the template matches them.

If more than one template is defined for a given document type, the activity identifies the best matching template and continues the data extraction process using that template alone.
Once the template is identified, the requested values are computed and reported. The confidence of each value is influenced by the template match confidence along with the word match confidences for all the words that are included in that value. Word level matching is also influenced by the MinOverlapPercentage parameter, as it governs how strictly the Form Extractor should enforce value boundaries.

Using the Template Manager Wizard

This wizard allows you to create, edit, and export/import templates for the document types defined in the taxonomy.

Creating a template

  1. Add a Form Extractor activity to your workflow.
  2. Configure the extractor by clicking on the Manage Templates button.
    • The Template Manager window opens.
  1. Click the Create Template button for creating a new template.
  1. Select the desired type from the Document Type drop-down list.

Note:

All Document Types are based on the Taxonomy. Make sure to add or create a Taxonomy inside the project's folder.

  1. Add the name of the template in the Template name field.
  2. Add the document's path in the Template document field.
    • Navigate to the file's path by using the Browse button.
  3. Select an OCR from the OCR Engine drop-down list.
  4. Click the Configure button for confirming and saving the template.
    If a template already exists, then you can choose to Edit or Remove it.

The OCR engine is applied only if necessary. If the document selected for building a template is a Native PDF, then no OCR engine is executed.
Each OCR engine comes with its own set of custom options. See the below tables for more details:

Microsoft OCR

Options
Description

Languages

Select one of the available languages.

Scale

Set up the scale value of the document.

Profile

Select the profile type of the OCR engine. The default value is Screen.

Tesseract OCR

Options
Description

Languages

Select one of the available languages.

Profile

Select the profile type of the OCR engine. The default value is Screen.

Scale

Set up the scale value of the document.

Invert

If selected, inverts the colors of the UI elements before scraping. This is useful when the background is darker than the text color.

OmniPage OCR

Options
Description

EnginePack

Select the type of the engine pack.

Languages

Select one of the available languages.

Profile

Select the profile type of the OCR engine. The default value is Screen.

Scale

Set up the scale value of the document.

If you already created a template, then it can be edited, exported, or removed.
Delete and Export buttons become available only when at least a template is selected. The Edit and Remove options for an individual template are always available.

For the documents that include check boxes, you have the possibility to add known synonyms for the Yes and No options, or you can choose to use the ones selected by us. After running the template, a computation confidence percentage is displayed, helping you to decide if human validation is required.

For the documents that include check boxes, you have the possibility to add known synonyms for the Yes and No options, or you can start from a list compiled by us (see the Add Recommended suggestions). These values are used for Boolean content interpretation, which is mapping a captured value to a Yes or No reported value.

Exporting and Importing Templates

You can import templates created and exported from other workflows.

Exporting Procedure

Here are the steps you need to follow to export a template:

  1. Create one or more templates by following the steps explained at the beginning of this page.
  2. Select the templates you want to export.
  3. Select an Export option (with or without the original files) as shown in the below screenshot. Exporting with original files attaches the template files to the export. The second option doesn't attach the files used for template creation.
  1. Save the template's archive with the desired name.
  2. A message is displayed once the template is saved. Select the OK button.

Importing Procedure

Here are the steps you need to follow to import a template:

  1. Select the Import button.
  1. Select an archive. The import wizard appears and presents all document types and all templates available in the selected export archive. Select the templates you wish to import and choose the right Import option (with or without the original files).

Note:

When templates are imported, document types are created automatically in the project's Taxonomy. If a document type with the same name already exists, another one is created by appending a count to the document type name.
If you are importing templates that have been exported without the original files, or if you choose to import templates without the original files, then you have no view or edit options for those templates.

Configuring a template with table selection

Once the Form Extractor is set you can edit the template. A Template Manager window appears for configuring the fields. You can follow the Validation Station activity for more instructions.
Here is how the process should look like after the template is configured:

When using the Selection Mode only specific fields become available for adding information. There are fields where information can be added only by using tokens (like the Page Matching Info fields) or only by using a custom area (like the Table field). The below GIF explains the difference between the two types of selections:

Note:

If an empty area is selected, the selection is automatically set as Custom area. If text is detected inside the selected area, you are asked to choose the type of the selection between Tokens or Custom area.

You can also find out the type of accepted selection for each field by verifying the icon beside each field as shown in the below GIF:

Note:

A Custom Selection defines the area from where the value can be extracted. If multiple selections are required, then the reported value is a collection of all words identified in all selections.

Table selection is now available in Template Manager. Check the GIF below for learning how to select the table:

Updated about a month ago


Form Extractor


Suggested Edits are limited on API Reference Pages

You can only suggest edits to Markdown body content, but not to the API spec.