The Intelligent Form Extractor is a specialized tool for processing fixed-layout documents for data extraction, that is building on top of the Form Extractor and adding extra capabilities, such as
- handwriting recognition and handwritten data extraction,
- signature detection.
The additional features of Intelligent Form Extractor, when compared to Form Extractor, make it a very good fit for processing all types of forms that
- may be printed OR handwritten,
- may require checking whether the form is signed or not.
These two additional features are configurable from the activity's Template Manager wizard, in addition to the configurations already present in the Form Extractor.
This extractor does not have learning (training) capabilities and requires up-front configuration.
You need to use your Automation Cloud Document Understanding API Key, or host your own instance of the Intelligent Form Extractor in AI Center on-prem, to use this extractor.
The Intelligent Form Extractor has two major configurations to be considered:
- the Template Manager wizard - that allows you to define templates to be applied to incoming documents. This wizard enables the Template Editor and the Boolean field interpretation settings.
- the MinOverlapPercentage setting - that allows you to control how strict the value area matching should be. Accepts a value between
100, and it controls what words are accepted or rejected from being part of a given value, based on how well their location fits the area defined in the template.
Allows you to create, edit, manage, and export/import templates for the document types defined in the taxonomy.
- Add an Intelligent Form Extractor activity to your workflow, within a Data Extraction Scope.
- Configure the extractor by clicking on the Manage Templates button.
- The Template Manager window opens.
- Click the Create Template button for creating a new template.
- Select the document type for your template from the Document Type dropdown list.
All Document Types are based on the Taxonomy. Make sure to add or create a taxonomy inside the project's folder.
- Add the name of the template in the Template name field. Choose a relevant name that reflects the version or the layout of your document.
- Add the document's path in the Template document field.
- Navigate to the file's path by using the Browse button.
- Select an OCR from the OCR Engine dropdown list, and configure it according to its needs.
- Click the Configure button to trigger the template editing.
The OCR engine is applied only if necessary. If the document selected for building a template is a Native PDF, then no OCR engine is executed, unless the Force Apply OCR option is checked. If checked, the OCR is applied even on a native PDF file.
Each OCR engine comes with its own set of custom options. Here you can find more details about all options available for each OCR engine.
If you already created a template, then it can be edited, exported, or removed.
Delete and Export buttons become available only when at least a template is selected. The Edit and Remove options for an individual template are always available.
If a field is checked in both Signature and Handwritten boxes in the Template Manager of the Intelligent Form Extractor activity, then a popup message appears informing you that a field can be added only in one box, not both.
For the documents that include checkboxes, you can add known synonyms for the Yes and No options, or you can start from a list compiled by us (see the Add Recommended suggestions). These values are used for Boolean content interpretation, which is mapping a captured value to a Yes or No reported value.
You can import templates created and exported from other workflows. Use these features to share templates between projects. Once a document type is configured using the Intelligent Form Extractor, you don't need to reconfigure the templates in a new implementation.
Here are the steps you need to follow to export a template:
- Create one or more templates by following the steps explained at the beginning of this page.
- Select the templates you want to export.
- Select an Export option (with or without the original files) as shown in the below screenshot. Exporting with original files attaches them to the export.
- Save the template's archive with the desired name.
- A message is displayed once the template is saved. Select the OK button.
With or Without Original Files?
If you cannot share the content of the documents you have built your templates on, then use the "Without Original Files" option. You are still able to share and import the template archive in other projects, but you cannot edit or view them anymore.
If you want to edit the templates once imported in a different project, make sure to use the "With Original Files" option when exporting and then importing them.
Here are the steps you need to follow to import a template:
- Select the Import button.
- Select an archive. The import wizard appears and presents all document types and all templates available in the selected export archive. Select the templates you wish to import and choose the right Import option (with or without the original files).
When templates are imported, Document Types are created automatically in the project's Taxonomy. If a Document Type with the same name already exists, another one is created by appending a count to the Document Type name.
If you are importing templates that have been exported without the original files, or if you choose to import templates without the original files, then you have no view or edit options for those templates.
When a template is imported, several special situations might occur. The below table explains each situation and its particularities:
New document type
If a new document type is imported, then a new field is added in the wizard configurator, informing you that a new template is to be created.
Duplicate document type
If an identical document type is imported, then the following warning message appears:
If a document type template that includes extra fields than the already existing one, is imported, then the following warning message appears:
Extended document type
If the user imports a document type that includes extra fields than the already existing one, then the following warning message appears:
Document type with identical name but different content
If the user imports a document type that has the same name as the existing one but different fields, then the following warning message appears:
Document type with missing table
If the user imports a document type that doesn't include a table, then the following warning message appears:
Document type with extended table
If the user imports a document type that includes a table with extra columns, then the following warning message appears:
Document type with reduced table
If the user imports a document type that includes a table with missing columns, then the following warning message appears:
Table template with different document types
If the user imports a document type template that includes a table with different document types, then a new template is created.
The Template Editor is building on top of the functionality present in the Validation Station. Access it by clicking on the button of a template.
To learn about the basic usage of the Validation Station, read this section.
Besides the options available in the right part of the Validation Station screen, there are two options Template Editor specific:
Sets the anchor selection mode
Clears the whole anchor selection
When creating a new template, an explanation text appears when first opening the Template Editor. In case you want to access the text again, follow the steps below:
Anchors can be defined once the Template Editor is opened from the Template Manager and can be found among the Selection Mode options.
When defining or editing a page-level template, the first thing that needs to be performed is the Page 1 Matching Info selection, for fixed form template definition.
Situated on the left side of the screen, the Page 1 Matching Info selection requires a text input (tokens only are accepted) from the first page of the template that is always in the same position within that particular template layout and forms a unique graph of words (considering relative distances and angles between words) across all the templates defined for a particular document type.
In other words, the Page 1 Matching Info (and all other Page Matching Info fields) are "fingerprints" of a particular page and are extensively used in identifying the right matching template at runtime.
For this reason, for the Page 1 Matching Info field, it is strongly recommended to select 10 to 20 words, preferably longer in length, spread across the entire page area.
The other Page Matching Info fields (one for each template page) must be filled in only if you are attempting data extraction from that particular page, and do not require cross-template uniqueness anymore. If no fields need to be extracted from a particular page, defining the page-level matching info for that page is not mandatory.
For all fields other than Tables, configuring the template consists of selecting a Custom Area and assigning it to a particular field.
For fixed form configurations, data fields can only be configured using Custom Area selections.
For a field you can define one or more such Custom Areas, using the (+) button. If two or more Custom Areas are defined for a single field, then at runtime, if the field is defined in the Taxonomy as Single Value, all values are concatenated into a single reported value. If the field is defined as Multi Value, then each value is reported individually.
The below animation shows the difference between a Tokens and a Custom Area selection:
The icon beside each field indicates the type of supported selection:
If an empty area is selected, the selection is automatically set as Custom area. If text is detected inside the selected area, you are asked to choose the type of the selection between Tokens or Custom area.
Use the Validation Station selection mode feature to lock your selection between Tokens and Custom Areas.
As mentioned above, there are fields where information can be added only by using Tokens (like the Page Matching Info fields) or only by using a Custom Area (like simple fields). For Table fields, you can:
- define each cell one by one, once the Table Editor is expanded - by adding Custom Area selection to each cell individually, or
- use the table markup functionality - by marking the table area, drawing row, and column separators, and then assigning the thus marked table to the field. Make sure that the extracted area has the same number of columns and rows as the template area.
Check the animation below for learning how to use the table markup functionality:
A distinctive method of defining the bounds of a custom area from which data is to be extracted is to use field-level anchors. These allow for targeting data extraction based on field-level configurations, thus allowing for more flexibility when defining your form extraction rules.
Consequently, at run-time the Intelligent Form Extractor knows how to:
- identify if a page-level template matches, and extract information according to the best page-level template match it recognizes;
- identify if any anchor-based settings match, and extract information according to their application in the document to be processed;
- compute appropriate confidence scores for all possible matches, to be able to report the best result (highest probability match) of all available options.
- Make sure you are on the Anchor Selection mode.
- Draw a box around the value area.
- Select a Label (main anchor) for your value area by either clicking the first word and then Ctrl+Click the last word of the selection, or click, drag, and then release to capture a word range.
A Label can only contain consecutive words from the same visual line.
- Select any additional anchors that would uniquely identify your Label. The same selection principle applies.
- Assign your anchor construct to the appropriate field by selecting Extract Value for that particular field.
- Highlight your anchor setting.
- Make changes to it (delete any anchors, the label, even the value area if you wish, add new elements, etc.).
- Use the Change Extracted Value option to update your field association.
When Creating or Editing an Anchor Setting
If you delete the target area, all anchors get deleted and you start over.
If you delete the Label (main anchor), the first anchor in the order it was created becomes the new Label.
To delete an anchor setting, you can:
- use the Mark as Missing options for a saved value
- use the Remove Value in case of a list of anchors defined for a given field
You can define as many templates as you want for the same document type. You can have multiple page-level templates, multiple anchors for the same field, even templates containing both page-level as well as field-level anchors.
Tips and Tricks
When defining field-level anchors make sure your Label is close to your value area and it is supported by additional anchors if the same text construct can be found in multiple places within the same document.
The lengthier your labels and anchors are, the more precision you get.
The value area is always computed based on its relative position against your Label (main anchor). Choose your main anchors accordingly.
Having field-level anchors allows fields to move within the template and still be captured, offering more flexibility in document layout changes.
Updated 17 days ago