Release Date: 29 March 2021
Extended the Form Extractor and Intelligent Form Extractor capabilities by adding field-level anchor-based extraction rules. Besides page-level anchors, field-level anchors can now be defined in Template Editor - a new option of defining the bounds of a custom area from which data is to be extracted. As opposed to page-level configurations, which define data positions with respect to the entire page content, anchor-based configurations now allow for targeting data extraction based on field-level configurations, thus allowing for more flexibility.
Performance improvements on Validation Station.
Updated Validation Station and Classification Station design system for a better user experience.
The Validation Station, Classification Station, and Template Manager now have a three-state button, in the Document View side, that allow users to choose between different document interaction modes: Tokens (word selections), Custom area (area selection), and Choice on selection (users can choose between Tokens and Custom Area at each selection).
The user interfaces, Validation Station, Classification Station, and Template Manager, have been improved with a new selection mode in text view, now allowing users to perform selections from the text version of a document in the same way they interact with the original version. A new hotkey, d+s, was also added, to assist in switching between the original document view and the text view modes.
The Validation Station now displays a "crop" from the original document, when you assign a value to a data field, under the reported text value selected. This helps with locating and verifying a specific field value against the value area in the document.
Changed confidence calculation for Intelligent Keyword Classifier to be scalable with the length of the word vectors.
Added the IncludeOCRConfidence checkbox to the properties panel of the Export Extraction Results activity. If selected, the exported information will contain OCR Confidence for each value as well.
Improved letter and word processing algorithms to avoid reporting duplicate characters or words in certain situations.
Classify Document Scope and Train Classifiers Scope now support classifier capabilities.
Classify Document Scope has been optimized to perform sequential calls to the classifiers in its scope, with only the page ranges that are not already classified by a previous extractor.
- Fixed an issue that threw a runtime error in specific cases when a Form Extractor activity and an Intelligent Form Extractor activity were in the same Data Extraction Scope.
- Fixed an issue that prevented classifier errors to be thrown in specific cases, but classification failed silently.
- Fixed an issue that caused derived parts not to be extracted for a number field when processing a specific document.
- Fixed an issue in Digitize Document, that caused the activity to process document pages even after an exception was reported, thus increasing the overall execution time for cases of failure.
- Fixed a bug that did not allow for the correct configuration of Regex expressions in Regex Based Extractor, in C# projects, and other very specific situations.
- Fixed a performance issue that appeared in Validation Station and Template Editor, when a document type contained more than 200 fields.
- Fixed a bug in which, in certain situations, numbers were merged into a single reported numeric value.
- Fixed an issue through which, in certain situations, the Wait for Document Validation Action and Resume activity would throw an exception when communicating with storage buckets.
- The Create Document Classification Action / Create Document Validation Action and Wait for Document Classification and Resume / Wait for Document Validation and Resume do not work with storage buckets on which the Access Data Through Orchestrator has been enabled.
Release Date: 3 February 2021
Performance improvement of Validation Station.
Release Date: 11 January 2021
Improved file upload from Create Document Validation Action and Create Document Classification Action on AWS hosted storage bucket.
Release Date: 12 November 2020
CefSharp reference updated to version 84.4.10.
Updated Endpoints as follows:
- Form Extractor - from
- Intelligent Form Extractor - from
- Intelligent Keyword Classifier - from
Made improvements to Validation Station while in mark table mode.
- Fixed an issue for Classification Document Classification Action related to the expiration of the Bearer Token.
Release Date: 20 October 2020
More detailed error logging for Form Extractor, Intelligent Form Extractor and Intelligent Keyword Classifier.
- Improved address parsing.
- Fixed an issue where not all ML Skills were usable by the Form Extractor, Intelligent Form Extractor, and Intelligent Keyword Classifier activities.
Release Date: 5 October 2020
Five new activities have been included in the package:
Present Classification Station - designed for classifying and separating files based on the document type.
To easily identify the information in the Validation Station, color codes were added to field cards and tokens or custom areas. Each field card has by default a color code, while tokens or custom areas get the same color code as the field card they are assigned to.
New shortcuts have been added to Validation Station allowing the user to move a selected line from a table up, down, left, or right. Also, when selections are made in Validation Station, these can be assigned to a specific field using field-level shortcuts. Each field card has a key associated with it. When no selections are made, you can use field-level shortcuts to jump from one field card to another.
For Validation Station table fields, a row-level checkmark was added. You can now check all the fields from a row by selecting the checkmark or it will be automatically checked when you visit all the fields.
Tokens in Validation Station have been updated. Thus, the highlighted tokens have a red bottom border and the selected tokens have a dashed border.
Field values with no reference are now supported in Validation Station. Users can assign values to fields that do not have a reference in the document. To do so, while the user creates a field in Taxonomy Manager, the Requires Reference checkbox needs to be unchecked.
New shortcuts were created for Classification Station allowing the user to navigate through document types; add, change, remove or highlight reference; move all pages up or down; split after selected page; discard changes; save; report as exception.
Besides using the document type ant menu, a reference can now be removed at page level as well by hovering over a page and clicking the blue icon in the bottom right corner. The icon also allows the user to highlight the reference.
The Rotate button was added to the PDF Viewer. By clicking the button, the current document page will rotate clockwise.
Selection mode is enabled by default in PDF Viewer.
The Intelligent Form Extractor and Form Extractor activities can now incorporate imported templates that have the same name but different content as the already available ones. Each template is analyzed, and a warning message is displayed for each case.
The ActionPriority property from the Create Document Validation Action activity now supports expressions and variables.
The terms BucketFolderPath and DirectoryFolderPath were changed to BucketDirectoryPath and DownloadDirectoryPath for the Create Document Validation Action activity, respectively for the Wait for Document Validation Action and Resume activity. The reason behind this is to clearly separate from the Orchestrator concept of “Folder”.
Release Date: 24 August 2020
- Fixed an issue that in some cases was returning a
407ProxyAuthenticationRequirederror message for Kerberos or NTLM authentication requests. This applies to Form Extractor, Intelligent Form Extractor, and Intelligent Keyword Classifier.
- Fixed an issue that was causing the Intelligent Form Extractor to not properly display a timeout error.
- Missing translations were added for certain Validation Station strings.
- Fixed an issue that was causing the Data Extraction Scope to throw an error when unselecting a table field.
Release Date: 24 June 2020
- Fixed an issue that was causing synchronization problems between the configuration window and the Properties panel for the Wait for Document Validation Action and Resume activity.
- Fixed an issue where in some situations table fields were erroneously saved as multi-value in the taxonomy.
Release Date: 2 June 2020
The UiPath.IntelligentOCR.Activities package was updated to reference the latest Uipath Vision library.
Release Date: 4 May 2020
This release brings many new exciting activities such as Create Document Validation Action, and Wait for Document Validation Action and Resume that can be used to create, suspend, and resume orchestration workflows in the UiPath Action Center.
Two new extractors are here to be of your help. You can find them under the name of Form Extractor and Intelligent Form Extractor. Both activities can extract information from fixed form documents based on predefined templates, the difference being that the Intelligent Form Extractor can also be configured to interpret fields that are signed or handwritten. You can extract information from any type of field, including tables and create custom table extraction rules by using the Template Manager wizard.
While using the Intelligent Form Extractor activity, if the number of handwritten fields might have been exceeded, then a warning is displayed directly in the workflow. This does not stop the user from running the workflow.
The Regex Based Extractor activity received a new option named
UseVisualAlignment that can be used for complex layouts where it is easier for users to write regular expressions based on how words are visually organized on lines, ignoring any sentence, paragraph, or layout group otherwise identified in the document.
You can define a regular expression for identifying the table area, a regular expression for identifying a table row in that area, and regular expressions for identifying specific columns in the table rows.
The Present Validation Station and its wizard come with many new and improved features.
The Validation Station wizard now has a new button named Discard changes. You can use it for confirming or dismissing any changes done in Validation Station. The function can be used on each document type individually.
The wizard also has a new option named Show Suggestions that allows you to select one value from multiple candidates if the used extractors report multiple possible values.
The list of shortcuts available in the Validation Station has been enriched with a new one,
f+a, allowing you to add a new value in a multiple values field.
Improvements have been made on the Digitize Document activity that can now better identify the check boxes in a document.
This activity also has a new option named ForceApplyOCR. When selected, it applies the OCR engine to all the pages of the document, including native PDF.
The Data Extraction Scope activity can now automatically read Extractor capabilities (internal taxonomies) if the Extractor declares them. This simplifies the configuration step by exposing the extractor's known fields. The Machine Learning Extractor now supports this new functionality, making it very easy to use and configure.
The Export Extraction Results activity received a new option named IncludeConfidence. If selected, the confidence level is provided.
The extraction and configuration wizards now support bulk field selection for document types and table fields.
Release Date: 14 January 2020
- Fixed an issue that was causing the Validation Station wizard to display incorrectly the table preferences, when using the Extract new table option.
- Fixed an issue that was returning an error when Validation Station wizard was run with Callout activity chained before or after it. Now, the activity runs as expected.
- Fixed an issue that was causing the Data Extraction Scope activity to throw an error when it was run with a customized machine culture and the FormatValuesIfPossible option selected. Now, the activity runs as expected.
- Fixed an issue that was causing some performance issues when large amounts of text were selected in the Text View option of the Validation Station wizard. Now, the Text View option displays the text as expected.
- Fixed an issue that was causing the Data Extraction Scope activity to throw an error when it was run with an extractor without an internal taxonomy set and a new field was added in the project’s taxonomy. Now, the activity runs as expected.
- On certain machines, rotated documents were not displayed properly when using the Validation Station.
Release Date: 6 December 2019
- Major updates occurred for the UiPath.IntelligentOCR.Activities package. All activities used for working with FineReader and FlexiCapture Abbyy product families were moved into a separate package named UiPath.Abbyy.Activities. This has led to a breaking change for the UiPath.IntelligentOCR.Activities package, which caused the version to skip ahead from v3.1.0 to v4.0.0. Here are the activities moved from the UiPath.IntelligentOCR.Activities package into the UiPath.Abbyy.Activities:
- The UiPath.Abbyy.Activities package cannot be used with versions lower than v19.11 for the UiPath.UIAutomation.Activities package and lower than v4.0.0 for the UiPath.IntelligentOCR.Activities package.
- If after updating a workflow to the new UiPath.IntelligentOCR.Activities v4.0.0 and UiPath.Abbyy.Activities v1.0.0 you encounter runtime validation errors, please force a new save on the
.xamlfile by making a small change and then reverting it. This might occur for workflows using FlexiCapture activities.
- Workflows created or upgraded to UiPath.IntelligentOCR.Activities v4.0.0 cannot be downgraded to a lower UiPath.IntelligentOCR.Activities version.
- An exception was thrown when the Digitize Document activity was used together with the OmniPage OCR for documents with special characters included in the Extended engine pack. The issue was fixed and now the activity is executed as expected.
Release Date: 25 November 2019
Performance improvements took place for processing files within the document processing framework, for the UiPath.IntelligentOCR.Activities package.
Release Date: 25 November 2019
Performance improvements took place for processing files within the document processing framework, for the UiPath.IntelligentOCR.Activities package.
Release Date: 8 November 2019
New exciting features and improvements are brought to you with this release.
A new activity meant to help you better organize and manage your trainable classifiers is available: Keyword Based Classifier Trainer. This activity can be used only together with the Train Classifiers Scope activity.
The Validation Station wizard received an important upgrade and is now available for you to explore its maximum potential. This wizard becomes available only when the Present Validation Station activity is used in a workflow. You can use the upgraded version for benefiting from a new user-friendly interface, navigating through the document while using the keyboard shortcuts, or selecting one or multiple words or a custom area. You can easily mark a field as missing, extract new data, edit a table, or extract a new table. All these marvelous things can be done with the Validation Station wizard while using a dark theme.
One of the improvements included in this release is that the Keyword Based Classifier activity received a new parameter named LearningData. Besides specifying where the learning file data are located, you can now also use the string containing the serialized classifier data. This activity was enhanced with a wizard named Manage Keyword Based Classifier Learning that can be used for configuring and managing the keywords used for identifying specific document types.
Both the Keyword Based Classifier and Keyword Based Classifier Trainer activities are now able to manage multiple keywords. After the keyword sets are selected, the extraction is based on a full match of the selected words.
Another great improvement is that the DocumentObjectModel output, included in the Digitize Document activity, can now support word polygons, besides word horizontal boxes.
The Taxonomy Manager wizard received a new scrolling bar that incorporates all UI elements and it provides a better user experience.
Data Extraction Scope, Train Extractors Scope, Train Classifier Scope, and Classify Document Scope activities are now arranging their extractors and classifiers in horizontal order, replacing the previous vertical order.
The Regex Based Extractor activity has been improved and can now process and return multi-values. The output is visible only when the activity is used together with the Validation Station.
Four new languages, Turkish (TR), Portuguese (PT), Spanish (ES), and Spanish-Mexico (ES-MX) are available for the UiPath.IntelligentOCR.Activities package.
- Taxonomy Manager can be accessed only if you previously opened a
.xamlfile. If no files are opened when you access the Taxonomy Manager, a recording window is shown and Taxonomy Manager is displayed only after closing the recording window.
- An exception was thrown when using the Data Extraction Scope activity together with a Try Catch activity. The issue was fixed and now the activity is executed as expected.
- When a Boolean field was set to No in Validation Station, the output file should have shown the result as No but instead is showing it as missing. The issue was fixed and now the output file shows the correct result.
- Fixed incorrect number parsing that occurred when the Data Extraction Scope was trying to parse numbers in documents using a different number format than the document's culture.
- When using multiple Validation Stations, the order of the derived parts was not respected in the validated results. The issue was fixed and now the results are displaying the derived parts in the same order they were introduced.
- Differences between the boxes with custom selection occurred when the results of a Validation Station were run through a second Validation Station. The issue was fixed and now there are no differences between boxes with custom selection.
- When the Digitize Document activity was used together with Microsoft Azure Computer Vision OCR engine, the rotation was not working when HandwritingRecognition parameter was set as True. The issue was fixed and now the information is processed correctly.
- When using Digitize Document activity, an error occurred when trying to process images with a lot of text. The bug was fixed by improving the scaling process.
- Fixed an issue that was throwing when trying to train the Keyword Based Classifier activity in the training scope and the extraction was run without a classification reference. The issue was fixed and now the fact that there is no learning information is only logged, not thrown as an error.
- An error was thrown when using the FlexiCapture Extractor activity and the same name was given to both a table column and a field. The issue was fixed and the
.fcdotfile is now processed as expected.
If an error mentioning the Docotic.Pdf library is encountered at runtime, then you should upgrade the UiPath.IntelligentOCR.Activities package to version v3.1.0 or higher.
Release Date: 26 August 2019
This release presents a new activity, RegEx Based Extractor accompanied by a wizard as well. You are now able to easily configure your RegEx expression by using the wizard and to extract specific information from documents.
Three new languages, German (DE), South Korean (KO), and Portuguese (PT-BR) are available for the UiPath.IntelligentOCR.Activities package.
- Major updates related to the internal handling of
- Fixed an issue that caused the Digitize Document activity to throw an error when trying to process a high-resolution image inside a
- Fixed an issue that caused the Present Validation Station activity to fail loading images with very high resolutions. Now all images, no matter the resolution size, are processed correctly.
- Fixed an issue that caused the Digitize Document activity to receive incorrect character coordinates. The character coordinates are now received correctly.
- Improved integration of the Digitize Document activity with OCR engines, including support for rotation (where supported by the OCR engine) and improved accuracy of word-building.
If you want to use the UiPath.IntelligentOCR.Activities package in the same project with the UiPath.PDF.Activities package, you need to use either version 2.x of both, or versions 3.x of both.
UiPath.IntelligentOCR.Activities version 3.0 and higher is incompatible with a UiPath.PDF.Activities version lower than 3.0, and a UiPath.PDF.Activities version 3.0 or higher is incompatible with an UiPath.IntelligentOCR.Activities version lower than 3.0.
Release Date: 16 July 2019
We have improved your experience with the Taxonomy Manager wizard. You can now edit the name you gave to any of your Groups or Categories upon creating them.
- Fixed an issue that was populating the headers of tables reported by the Data Extraction Scope activity with the names of the extractors. Now, the headers are populated only with the custom names from the Taxonomy.
Release Date: 26 June 2019
We want to reach out to the entire world and make automation a language everyone can speak. So, starting with this release, the entire platform is available in Chinese.
Chinese can only be used in this pack when installed in Studio v2019.4.4 or v2019.7 or above.
Release Date: 24 June 2019
The Train Extractors and Classifiers activity has been deprecated starting with v2.2.0. and it is now replaced by the two newly added activities:
The Data Extraction Scope now has a new check box, FormatValuesIfPossible, available in the Properties field.
The Taxonomy Manager option has received a new Close button that is visible and accessible through all the setting wizard.
Release Date: 21 May 2019
We have improved error messages throughout the entire activity pack, so you can solve issues faster and with less hassle.
Release Date: 25 April 2019
As you're probably used to by now, month after month we draw closer to our final goal of creating the ultimate document processing platform. Alongside the first enterprise release of this year, the IntelligentOCR activities pack has been imbued with some new activities, as follows:
The UiPath.DocumentProcessing.Contracts pack enables you to implement your own extractor and classifier activities by simply referencing it. This assembly contains all the classification and extraction interfaces that underlie the IntelligentOCR activities.
The Taxonomy Manager now displays the Document Type ID of the document type that is being edited.
- While migrating to the public UiPath.DocumentProcessing.Contracts, the IntelligentOCR v2.0.0 activity pack introduces breaking changes for the Classify Document Scope and Train Classifiers And Extractors activities.
- Opening the Data Extraction Wizard throws an error when the Data Extraction Scope activity or a parent activity is commented out.
- Fixed an issue that caused the Present Validation Station activity to throw an exception when processing certain
- Digitize Document was unable to detect check boxes in certain documents.
Release Date: 22 February 2019
- Fixed an issue that caused the Process Document activity to throw an error when processing large PDF files.
Release Date: 20 February 2019
The Taxonomy Manager is the next piece of the document processing puzzle, a wizard created to help you build custom taxonomy files that can then be reused across processes.
We have developed the Load Taxonomy activity, which grants you the ability to load a taxonomy created with the aid of the Taxonomy Manager wizard into a variable that can then be passed on to other activities.
The DegreeOfParallelism property has been added to the Digitize Document activity, enabling you to perform OCR analysis on multiple pages simultaneously. This is not a breaking change, so old workflows still function properly after updating to the latest version of the pack.
The IntelligentOCR pack is now upgraded to .NET Framework v4.6.1.
MatchingDocumentDefinition property of the
FCDocument variable has been exposed. Assigning it to a variable generates the same result as a Classify Document activity.
- The Tesseract OCR engine fails to properly read images with black borders.
Release Date: 18 February 2019
The IntelligentOCR pack has been upgraded with some new activities that regard document classification. These activities are:
- Fixed an issue that caused the Process Document activity to crash when processing documents that contained check boxes.
- Certain types of
- Certain types of
.jpgfiles caused the Digitize Document activity to throw errors.
- In certain circumstances, editing a table with confidence below 100% and making no changes to it modified the confidence to 100%.
- Fixed an issue that caused the Extract manual token as reference for this field button to remain disabled.
Release Date: 21 January 2019
The Digitize Document activity has been improved performance-wise with some backend changes.
- Fixed an issue that caused certain UI elements to flicker in the Validation Station wizard.
OperatorConfirmedflag in the
ExtractionResultsJSON file remained
Falseregardless of whether a user had confirmed the extraction results or not.
- In certain cases, the Prepare Validation Station Data activity could not read document information from
Release Date: 10 January 2019
This new year brings two more languages in the entire UiPath Platform - French and Russian. Since we layed down the foundations of localization in our previous release, we are continuing our efforts in bringing you a more immersive experience and lowering the language barrier bit by bit.
Release Date: 14 December 2018
- Fixed an issue that caused the FlexiCapture engine to always return a confidence score of 100.
Release Date: 12 December 2018
The IntelligentOCR package has received a major update, as we've developed three new activities that enable you to approach Document Processing in a much simpler manner. The new activities are:
- Present Validation Station - offers attended users the ability to make real-time CRUD (Create, Read, Update, and Delete) operations on documents for the purpose of classification and human data validation and extraction.
- Prepare Validation Station Data - creates a bridge between FlexiCapture's Process Document activity and the new Validation Station, ensuring a much more user-friendly data validation experience.
- Digitize Document - provides a new way of generating text versions from incoming documents, being able to process any PDF and most image formats.
Release Date: 8 October 2018
The moment is finally here - the entire UiPath Platform has been localized, so that you can have a truly immersive experience, from install to design and execution. Now, besides English, you can access everything, including our online documentation, in Japanese.
Release Date: 4 June 2018
To step up on our OCR game, coming to the aid of your digitization efforts, we have integrated the capabilities of the ABBYY FlexiCapture SDK into the new UiPath.IntelligentOCR.Activities pack, which contains the following:
- IntelligentOCR Scope - Initializes the ABBYY FlexiCapture engine and provides a scope for all IntelligentOCR activities.
- Process Document - Processes a document with the FlexiCapture engine and converts it to an
FCDocumentvariable which can be used in other activities.
- Classify Document - Enables you to classify a given document based on an ABBYY classifier file and one or more templates.
- Export Document - Exports FlexiCapture documents to one of the
- Get Field - Retrieves a specified field from an
FCDocumentvariable and returns it as an
- Get Table - Retrieves a specified table from an
FCDocumentvariable and returns it as an
- Validate Document - Validates a processed document contained in a
FCDocumentvariable by using the ABBYY SDK and returns it in the same format.
Updated 6 days ago