UiPath Activities

The UiPath Activities Guide

v20.1 Position Based Extractor - Preview

Release Date: 20th Jan 2020, Version: v20.1

UiPath.IntelligentOCR.Activities.DataExtraction.PositionBasedExtractor

Extracts, matches, and reports the required information by taking into consideration the words' position inside the document. This activity can be used only together with the Data Extraction Scope activity.

Properties

Common

  • DisplayName - The display name of the activity.

Input

  • MinOverlapPercentage - Specifies the minimum overlap area (in percentage) between a box in the document and a box in the template required to make an extraction. The percentage value can be set between 0 and 100. The default value is 50.

Misc

  • Private - If selected, the values of variables and arguments are no longer logged at Verbose level.
  • SerializedTemplates - Defines the serialization for the component's template which can be afterward automatically generated on demand.

Using the Position Based Extractor Template Manager Wizard

This wizard allows you to create, edit and export/import templates for the document types defined in the taxonomy.

Creating a template

  1. Add a Position Based Extractor activity to your workflow.
  2. Configure the extractor by clicking on the Manage Templates button.
    • The Position Based Extractor Template Manager window opens.
  1. Click the Create Template button for creating a new template.
  1. Select the desired type from the Document Type drop-down list.

Note:

All Document Types are based on the Taxonomy. Make sure to add or create a Taxonomy inside the project's folder.

  1. Add the name of the template in the Template name field.
  2. Add the document's path in the Template document field.
    • Navigate to the file's path by using the Browse button.
  3. Select an OCR from the OCR Engine drop-down list.
  4. Select the language of the document from the Languages drop-down list.
  5. Add a value in the Scale field.
  6. Select the Get Words Info check box if you want the digitization to be done using words, not letters.
  7. Click the Configure button for confirming and saving the template.
    If a template already exists, then you can choose to Edit or Remove it.
    Each OCR engine comes with its own set of custom options. See the below tables for more details:

Microsoft OCR

Options
Description

Languages

  • Select one of the available languages.

Scale

  • Set up the scale value of the document.

Get Words Info

  • Specify if the digitization should be done at the word or letter level.

Tesseract OCR

Options
Description

Languages

  • Select one of the available languages.

Characters

  • Select the type of the characters.

Scale

Invert

Get Words Info

  • Specify if the digitization should be done at the word or letter level.

OmniPage OCR

Options
Description

Bundle Type

  • Basic - includes only some selected options
  • Extended - includes all options

Languages

  • Select one of the available languages.

Profile

  • Select the profile type of the OCR engine. The default value is Screen.

Scale

  • Set up the scale value of the document.

If you already created a template, then it can be selected for usage, editing, exporting, or removing.
Delete and Export buttons become available only when a template is selected.

Importing a template

You can import and use a template only if you previously created and exported one.
here are the steps you need to follow in order to export and then import a template:

  1. Create a template by following the steps explained at the beginning of this page.
  2. Export your template as shown in the below screenshot:
  1. Save the template's archive with the desired name.
  2. This popup message is displayed once the template is saved. select the OK button.
  1. You can now start importing your template by selecting the Import button.
  1. Open the previously saved template and select the Import button with the desired option.

Running the workflow

Once the Position Based Extractor is set and the template is customized, you can run the workflow. A wizard appears for validating the data. You can follow the Validation Station instructions.
Here is how the process should look like after the workflow is run:

Updated about a month ago


v20.1 Position Based Extractor - Preview


Release Date: 20th Jan 2020, Version: v20.1

Suggested Edits are limited on API Reference Pages

You can only suggest edits to Markdown body content, but not to the API spec.