UiPath Activities

The UiPath Activities Guide

v20.3 Form Extractor - Preview

Release Date: 16th March 2020, Version: v4.4.0-preview

UiPath.IntelligentOCR.Activities.DataExtraction.FormExtractor

Extracts, matches, and reports the required information by taking into consideration the words' position inside the document. This activity can be used only together with the Data Extraction Scope activity.

Properties

Common

  • DisplayName - The display name of the activity.

Input

  • MinOverlapPercentage - Specifies the minimum overlap area (in percentage) between a box in the document and a box in the template required to make an extraction. The percentage value can be set between 0 and 100. The default value is 65.

Misc

  • Private - If selected, the values of variables and arguments are no longer logged at Verbose level.
  • SerializedTemplates - Defines the serialization for the component's template which can be afterward automatically generated on demand.

Using the Template Manager Wizard

This wizard allows you to create, edit and export/import templates for the document types defined in the taxonomy.

Creating a template

  1. Add a Form Extractor activity to your workflow.
  2. Configure the extractor by clicking on the Manage Templates button.
    • The Template Manager window opens.
  1. Click the Create Template button for creating a new template.
  1. Select the desired type from the Document Type drop-down list.

Note:

All Document Types are based on the Taxonomy. Make sure to add or create a Taxonomy inside the project's folder.

  1. Add the name of the template in the Template name field.
  2. Add the document's path in the Template document field.
    • Navigate to the file's path by using the Browse button.
  3. Select an OCR from the OCR Engine drop-down list.
  4. Select the language of the document from the Languages drop-down list.
  5. Select the profile type of the OCR engine from the drop-down list of the Profile.
  6. Add a value in the Scale field.
  7. Click the Configure button for confirming and saving the template.
    If a template already exists, then you can choose to Edit or Remove it.
    Each OCR engine comes with its own set of custom options. See the below tables for more details:

Note:

Once a table is defined in Taxonomy, a new table field becomes available in Template Manager where you can define the table and the table area.

Microsoft OCR

Options
Description

Languages

  • Select one of the available languages.

Scale

  • Set up the scale value of the document.

Get Words Info

  • Specify if the digitization should be done at the word or letter level.

Tesseract OCR

Options
Description

Languages

  • Select one of the available languages.

Profile

  • Select the profile type of the OCR engine. The default value is Screen.

Scale

  • Set up the scale value of the document.

Invert

  • If selected, inverts the colors of the UI elements before scraping. This is useful when the background is darker than the text color.

OmniPage OCR

Options
Description

EnginePack

  • Select the type of the engine pack.

Languages

  • Select one of the available languages.

Profile

  • Select the profile type of the OCR engine. The default value is Screen.

Scale

  • Set up the scale value of the document.

If you already created a template, then it can be selected for usage, editing, exporting, or removing.
Delete and Export buttons become available only when a template is selected.

For the documents that include check boxes, you have the possibility to add known synonyms for the Yes and No options, or you can choose to use the ones selected by us. After running the template, a computation confidence percentage is displayed, helping the user in deciding if human validation is required.

Importing a template

You can import and use a template only if you previously created and exported one.
here are the steps you need to follow in order to export and then import a template:

  1. Create a template by following the steps explained at the beginning of this page.
  2. Export your template as shown in the below screenshot:
  1. Save the template's archive with the desired name.
  2. This popup message is displayed once the template is saved. select the OK button.
  1. You can now start importing your template by selecting the Import button.
  1. Open the previously saved template and select the Import button with the desired option.

Configuring a template with table selection

Once the Form Extractor is set you can edit the template. A Template Manager window appears for configuring the fields. You can follow the Validation Station for instructions.
Here is how the process should look like after the template is configured:

When using the Selection Mode only some fields become available for adding information. There are fields where information can be added only by using tokens (like the Page Matching Info fields) or only by using a custom area (like the Table field). The below GIF explains the difference between the two types of selections:

Note:

If an empty area is selected, the selection is automatically set as Custom area. If text is detected inside the selected area, you are asked to choose the type of the selection between Tokens or Custom area.

You can also find out the type of accepted selection for each field by verifying the icon beside each field as shown in the below GIF:

Note:

A Custom Selection defines the area from where the value can be extracted. If multiple selections are required, then the reported value is a collection of all words identified in all selections.

Table selection is now available in Template Manager. Check the GIF below for learning how to select the table:

Updated 19 days ago


v20.3 Form Extractor - Preview


Release Date: 16th March 2020, Version: v4.4.0-preview

Suggested Edits are limited on API Reference Pages

You can only suggest edits to Markdown body content, but not to the API spec.