- Document Understanding Release Notes
- ML Packages and Public Endpoints Release Notes
- General ML packages and public endpoints updates
- ML packages and public endpoints version history
Document Understanding Release Notes
ML packages and public endpoints version history
Release date: 27 November 2024
Released in UiPath Document Understanding OCR and endpoints | v24.11.3
Improvements
In this release, we have enhanced the accuracy and performance for various text types. This includes text printed on very large or low-resolution images, as well as handwritten text.
Recognition for checkboxes, especially those represented by fully blackened squares or rectangles, is significantly improved. Additionally, we have also fine-tuned signature detection.
Release date: 23 July 2024
Released in UiPath Document Understanding OCR and endpoints (including UiPath Document Understanding OCR_CPU) | v24.7
Improvements
- The accuracy for the Azerbaijani language is improved by adding recognition for the əƏ characters.
- The recognition and detection for Magnetic Ink Character Recognition (MIRC) is improved, bringing enhanced accuracy especially for checks.
- Previously, numbers were not recognized in some instances when a space was used as separator. This is now improved and numbers are now recognized when space is used as separator.
Bug fixes
The confidence score for the UiPath Document Understanding OCR is improved, particularly when used on lower quality images. In workflows where confidence score is used to decide if documents need human validation in Action Center, this improvement may result in an increased number of documents undergoing validation.
Release date: 3 October 2024
Released in Data Extraction ML packages | v24.4.4
Bug fixes
- We've fixed an issue that was causing AI Center training pipelines to report
inaccurately high scores for
ID Number
andPhone Number
field types. This ensures that the reported scores match the actual scores. - We've corrected an issue that was related to parsing values on Japanese fields when the Extended Languages OCR was in use.
Release date: 14 August 2024
Released in endpoints + DocumentUnderstanding + Data Extraction ML packages | v24.4.3
Improvements
Improved field text formatting for Chinese, Japanese, and Korean languages when using the UiPath® Extended Languages OCR in the digitization step.
Release date: 24 May 2024
- DocumentUnderstanding + Data Extraction ML packages | v24.4.0
- DocumentClassifier ML packages | v24.4.0
What's new
Improvements
- Accuracy for the Invoices Japan ML package is improved. There are also 11 new fields for the Invoices Japan model. For the complete list of extracted fields, check the Out-of-the-box models details file.
- The performance for the Payslips model is improved.
- New IDs are available for the
ID Cards ML package:
- Aadhaar ID cards
- Saudi Arabian ID cards
- PAN cards
- New fields are available for the UB04 ML package. For the complete list of extracted fields, check the Out-of-the-box models details file.
- New fields are available for the Checks ML package. For the complete list of extracted fields, check the Out-of-the-box models details file.
Erratum - added 20 June 2024: Added information regarding a bug fix related to the parsing of Japanese dates.
Erratum - added 28 May 2024: Added more information on several improvements.
Release date: 13 March 2024
Released in DocumentUnderstandingOCR Endpoints | v24.3.2
A new version for the Document Understanding OCR is now available for general usage.
- The accuracy for Turkish (TUR) is improved. The performance for characters with diacritics (such as Ç, ç, Ğ, ğ, I, ı, İ, i, Ş, ş, Ö, ö, Ü, ü) is improved.
- The accuracy for Eastern-Arabic numerals (٠, ١, ٢, ٣, ٤, ٥, ٦, ٧, ٨, ٩) is improved.
Release date: 1 April 2024
Released in Data Extraction ML Packages | v24.2.0
- 1040 Schedule C
- 1040 Schedule D
- 1040 Schedule E
- UB04
Release date: 15 October 2024
Released in UiPath Document Understanding OCR and endpoints | v23.10.5
Improvements
This release brings accuracy and performance improvements for handwriting recognition.
Bug fixes
We've fixed an issue where annotation boxes were returned horizontally, even though some documents were slightly skewed, causing misalignment in the annotation.
Release date: 28 March 2024
Released in Data Extraction ML packages | v23.10.4
A new version for the out-of-the-box pre-trained ML packages is now available for general usage.
- The accuracy for Turkish (TUR) is improved. The performance for characters with diacritics (such as Ç, ç, Ğ, ğ, I, ı, İ, i, Ş, ş, Ö, ö, Ü, ü) is improved.
- The accuracy for Eastern-Arabic numerals (٠, ١, ٢, ٣, ٤, ٥, ٦, ٧, ٨, ٩) is improved.
- The accuracy for datasets smaller than 400 pages is improved.
Release date: 12 February 2024
Released in Endpoints + DocumentUnderstanding + Data Extraction ML Packages | v23.10.3
A new version for all out-of-the-box pre-trained ML packages part of AI Center is now available for general usage.
This new version brings a bug fix related to the extraction of bidirectional (left-to-right and right-to-left) text values.
Release date: 23 January 2024
Released in DocumentUnderstanding + Data Extraction ML packages | v23.10.2
A new version for all out-of-the-box pre-trained ML packages is now available for general usage.
This release brings a bug fix that occasionally caused training to fail.
Release date: 26 October 2023
Released in Endpoints + DocumentUnderstanding + Data Extraction ML packages | v23.10.0
A new version for all out-of-the-box pre-trained ML packages is now available for general usage.
We are constantly working to improve your Document Understanding experience. For this release, we made sure to bring minor security and stability improvements to our product.
Release date: 3 August 2023
Released in DocumentUnderstanding + Data Extraction ML packages | v23.7.0
- In documents where a table runs across many pages, a table row (a line item) gets split across 2 pages, in some cases even more. The previous model versions assumed that each page break was also a row-break, and it broke items into multiple pieces. The current model version fixes this issue. To benefit from this feature in a workflow, you need to use the DocumentUnderstanding.ML.Activities package version 1.23.0-preview, and the 23.7.0 model version in that particular workflow.
- Models now have a faster prediction time per page, and use RAM more efficiently, allowing processing of larger documents.
Release date: 24 March 2023
Released in DocumentUnderstanding | v23.4.2
The UiPath Document OCR public endpoint has been updated and now provides handwriting language support for German and French, and print language support for Danish, Finnish, Norwegian, and Swedish. Here's the complete list of the new supported languages: Danish, Swedish, Norwegian, Finnish, Polish, Hungarian, Czech, Slovakian, Estonian, Latvian, Lithuanian, Slovenian, Croatian, Serbian, Turkish.
Release date: 10 May 2023
Released in DocumentUnderstanding + Data Extraction ML packages | v23.4.0
The UiPath Document OCR is now available as an out-of-the-box pre-trained package, and it is available for both GPU and CPU usage. This enables customers who prefer to avoid using public endpoints to deploy UiPath Document OCR in their own tenants, in an isolated environment.
A list of seven new Out-of-the-box pre-trained ML Packages is now available for general usage. Here's the list of the seven new models:
- Certificate of incorporation/Good Standing
- Certificate of Origin
- Children Product Certificate
- CMS1500
- EU Declaration of Conformity
- Invoices Shipping
- Pay slips
Release date: 23 February 2023
Released in Endpoints + DocumentUnderstanding + Data Extraction ML packages | v23.2.0
What's new & improvements
A new version of the out-of-the-box pre-trained ML packages (23.1.0) and their public endpoints has been released, now using cutting edge LayoutLM Transformers based architecture, which is more powerful and increases accuracy overall, especially on column fields (tables).
This improvement has made the out-of-the-box pre-trained ML packages more powerful, meaning that you may experience longer latency for training and for predictions.
For all situations where latency is critical (e.g.: attended scenarios) we recommend deploying the models as ML Skills using a GPU.
We have improved how the scores are calculated after Training/Evaluation/Full pipelines to provide a separate score for each column field. Before this improvement, F1 scores were calculated as a whole, for all column fields taken together.
An upcoming removal is announced for the Manual edits feature used in the model evaluation. More information here.
Known issues
The project import from AI Center is currently disabled. We are actively working on this and expect to have it reenabled by the end of March.
Erratum 8 May 2023
Known issue
Fatal Python error: Segmentation fault
is received when running a
Full or Training Pipeline. We recommend using the ML Packages with v23.4 until this
bug is fixed.
Erratum 20 April 2023
Overall score for all pipelines is now an Accuracy. Previously it was an F1 score. The evaluation artefacts in AI Center still contain both accuracy and F1 score, for backwards comparability.
Release date: 11 January 2023
Released in Endpoints and DocumentClassifier | v23.1.0
We have improved the F1 scores and they are now also displayed for Training pipelines.
The Artifacts folder has an updated list of artifacts.
The DocumentClassifier model now predicts 25 classes, instead of 26, due to the removal of the Delivery Notes class.
Release date: 13 December 2022
Released in endpoints + DocumentUnderstanding + Data Extraction ML packages | v22.11.0
This release brings significant improvements to the public endpoints of the out-of-the-box pre-trained ML packages, meaning that we are now using the latest LayoutLM based Deep Learning architecture.
This improvement provides better accuracy on all document types, especially for the Invoices model, and it also improves the accuracy on column fields and tables.
We added new extracted fields to the Invoices model that now have Shipping Date, Vendor email address, Bank name, Bank account number, IBAN, SWIFT Code, Bank Address, Bank Routing number, and Tax rate. You can check the list of extracted fields by accessing this page and clicking on the link available for each model.
Model scores are now returned by Training pipelines too, not only by Full or Evaluation pipelines.
F1 scores are now available for each column field. Until now, F1 scores were available only for all column fields taken together.
Release date: 7 October 2022
Released in endpoints + DocumentUnderstanding + Data Extraction ML packages | v22.10.0
What's new & improvements
-Preview
tag: InvoicesAustralia, InvoicesIndia,
PurchaseOrders.
The DeliveryNotes model has been renamed as BillsOfLading.
Ten new pretrained models are now available: Acord25, 1040, Checks, Bank Statements, Financial statements, Packing Lists, Acord131, Acord126, Acord140, Vehicle Titles.
Bug fixes
Several bug fixes have been made to the above mentioned packages.
Release date: 4 October 2022
Released in UiPathDocumentOCR | v22.10.0 Cloud
A new feature is now available for barcodes and QR codes detection.
Accuracy improvements have been made on long strings like email addresses and URLs, on fixed width fonts, and on handwriting and signatures detection.
Page rotation detection has also been improved.
Release date: 6 September 2022
Released in DocumentUnderstanding + Data Extraction ML packages | v22.6.0-preview
There are 18 new Preview ML packages available with a more advanced model architecture for our DU ML Packages in AI Center. You can easily identify them by the Preview attached to the end of the package name, eg.: InvoicesPreview,PurchaseOrderPreview,Acord125Preview, etc.
We've updated the public endpoints list with all the new Preview ML packages and can be consulted Public Endpoints.
Worth mentioning is the fact that these preview models don't consume DU/AI units from your licensing entitlement.
Fixed a bug on private skills usage and now the private skill can be used only with an API key that belongs to the same organization that is using the AI Center instance.
Release date: 22 July 2022
Released in DocumentUnderstanding + Data Extraction ML packages | v22.5.2
Bug fixes
eol
classifier
and line_detection
methods into a single
method.
Known issue
There is a known issue for the Invoices package that ocassionally leads to an error when trying to run an auto-fine-tunning loop in AI Center.
Release date: 18 July 2022
Released in DocumentUnderstanding + DocumentClassifier + Data Extraction ML packages | v22.5.1
Bug fixes
- Fixed a bug that was causing the extracted fields to be shown on the wrong page in Validation Station.
- Fixed a bug that was causing the last line of text on some pages to not be digitized in Document Manager.
- Fixed a bug that was preventing displaying some F1 score items from the
evaluation_F1_invoices.txt
file in Full/Evaluation pipelines in AI Center. - Fixed a bug that was causing the wrong overall F1 score to be calculated in
evaluation_F1_invoices.txt file
in Full/Evaluation pipelines in AI Center whenever a model had only column fields.
Release date: 14 July 2022
Released in DocumentUnderstanding + DocumentClassifier + Data Extraction ML packages | v22.4.2
Bug fixes
- Fixed a bug that was causing the extracted fields to be shown on the wrong page in Validation Station.
- Fixed a bug that was causing the last line of text on some pages to not be digitized in Document Manager.
- Fixed a bug that was preventing displaying some F1 score items from the
evaluation_F1_invoices.txt
file in Full/Evaluation pipelines in AI Center. - Fixed a bug that was causing the wrong overall F1 score to be calculated in
evaluation_F1_invoices.txt file
in Full/Evaluation pipelines in AI Center whenever a model had only column fields.
Release date: 3 June 2022
Release date in AI Center Cloud, Data Extraction ML packages | v22.4.1
Bug fixes
line_detection mode
, causing predictions to be different
than when called from the ML Skill.
Release date: 10 May 2022
Released in DocumentUnderstanding + DocumentClassifier + Data Extraction ML packages
| v22.4.0
What's new
Handwriting capabilities are now available for the UiPathDocumentOCR and the UiPathDocumentOCR_CPU packages, by integrating the HandwritingRecognitionOCR. The same capabilities can be found in the UiPath.OCR.LocalServer Studio package.
New architecture on extraction ML packages, with major benefits, especially to models trained using the DocumentUnderstanding ML package.
Utility Bills, W9, and Passports ML Packages are now available as GA. Five new out-of-the-box pre-trained ML packages are now available in -Preview to ease your work.
Five new out-of-the-box pre-trained ML packages are now available in -Preview to ease your work.
Document Search is a new feature available in Document Manager facilitating labelling documents with a high number of pages.
Improvements
Improvements have been made to the ML packages for document extraction in AI Center. The Evaluation Excel spreadsheet has received new sheets, allowing you to better organize and interpret the evaluated data.
ML Packages in Automation Suite offline installation have received a new offline bundle.
Accuracy and performance have been improved for the UiPathDocumentOCR.
Bug fixes
Multiple fixes on parsing date fields, including dates in Column fields, dates in Turkish documents, dates far into the future
Release date: 7 March 2022
Released in UiPathDocumentOCR | v22.2.3
Superior capability
Integrated HandwritingRecognitionOCR into UiPathDocumentOCR. In many cases, there is a mix of fields. By integrating the handwriting reading capability, we are able to apply the correct recognition to each field: print recognition to print text, and handwriting recognition to handwritten text.
Altough HandwritingRecognitionOCR can detect any handwriting, please know that it is trained and optimized only for English.
Release date: 14 March 2022
Released in DocumentUnderstanding + DocumentClassifier + Data Extraction ML packages
| v22.1.6
Bug fixes
Fixed a bug that was causing a training pipeline or a full pipeline in AI Center to fail due to an ML package issue in data pre-processing for an empty line.
Release date: 2 March 2022
Released in DocumentUnderstanding + DocumentClassifier + Data Extraction ML packages | v22.1.4
What's new
The Utility Bills ML package is now generally available.
Improvements
Overall improved performance and scalability.
Significant improvements on scores when training on the new version of the DocumentUnderstanding ML package as compared to previous versions.
Dates in column fields are now parsed correctly.
Date parsing now recognizes Turkish month names.
Changes
Changed the behavior for Training Pipelines and Full Pipelines when training on GPU versus on CPU. The 21.10.x models trained on CPUs were smaller, so they trained faster than the previous versions, while having slightly lower accuracy than before.
This behavior has been reversed with this release, so the model being trained on GPU and on CPU is the exact same model, and the training speed has reverted to what it was before 2021.10, which means training on CPU is again 10-20X slower than on GPU.
Release date: 24 November 2021
Released in Data Extraction ML packages | v21.10.9
Fixed a bug that was throwing a prediction error at runtime.
Release date: 22 October 2021
Released in Data Extraction ML packages and endpoints | v21.10.9
What's new
The PurchaseOrders ML package is now Generally Available and it is ready to be used in your production scenarios.
InvoicesChina, DeliveryNotes, RemittanceAdvices, W2, and W9 ML packages are now in Public Preview. We recommend you check out these packages and start using them for the type of documents you need to process.
Improvements
Implemented document level evaluation. This is representative for the runtime performance in your RPA workflow.
Evaluation can also be done on datasets with fewer fields than the ML Package being evaluated. This facilitates evaluation on out-of-the-box pre-trained ML Packages.
eval.redo_ocr
needs to be set to true in the AI Center Evaluation Pipeline.
Training on CPU now uses a smaller model to obtain a 5x-7x speedup. However, you should expect a lower accuracy by 0-5% on CPU.
Evaluation.xlsx
files produced by Evaluation
Pipelines.
The UtilityBills ML Package has been substantially improved.
Address parsing improvement for addresses which skip 1-2 lines of text.
Improvement on extracting negative values, very large values (11 digits or more), or dates far into the future.
Added support for rotated boxes on receipts.
Concatenated spans enhancement.
Bug fixes
- Fixed a bug that was not returning special characters in String type fields.
- Fixed a bug for the Passports ML Package where the date written as an ordinal number (1st, 2nd, 3rd, 4th, etc.) was not parsed correctly.
Known issues
Retraining InvoicesJapan and InvoicesChina ML Packages using data from Validation Station is currently not supported. As a workaround, please use Google Cloud Vision OCR.
Upcoming deprecations
All public endpoints, except for UiPathDocumentOCR, FormExtractor, IntelligentFormExtractor, and IntelligentKeywordClassifier, are going to be deprecated for non-West Europe regions starting with December 1, 2021.
Release date: 13 December 2021
Released in UiPathDocumentOCR endpoints | v21.10.5
Release date: 24 September 2021
Released in Data Extraction and endpoints for UiPathDocumentOCR | v21.10.1
Improvements
Added support for rotated text, even if the rotation is at different angles for each word.
Added support for vertical text. This improvement is available at the moment only for UiPath.IntelligentOCR.Activities, including Validation Station.Data Manager and Machine Learning Extractor do not support vertical text yet.
Accuracy improvement on noisy images or photos: for example, Receipts, ID Cards, or Passports.
Release date: 13 December 2021
Released FormExtractor + IntelligentFormExtractor + IntelligentKeywordClassifier in Endpoints | v21.10
Improvements
Form Extractor, Intelligent Form Extractor, and Intelligent Keyword Classifier are now also available in the Singapore region.
Release date: 11 August 2021
Released in Data Extraction and endpoints for Handwriting Recognition | v21.7
Improvements
Ability to deal with multiple shreds in a single call to the model.
Model retraining and a few other changes for better model accuracy.
Bug fixes
Fixed a bug that caused the pod to restart when there was no memory left.Release date: 8 June 2021
Released in endpoints and Data Extraction ML packages | v21.5.3
What's new
For images hard to read, as in the case of ID Cards and Passports, two new corresponding pre-trained Out Of the Box Packages have been released.
Improvements
Incorporated retrainable classification fields in our pre-trained Out Of the Box Packages.
Release date: 15 April 2021
Released in endpoints and Data Extraction ML packages | v21.4.5
What's new
Deployed all public endpoints in United States Region.
Deployed public endpoints for Form Extractor, Intelligent Form Extractor, and Intelligent Keyword Classifier in Canada and Japan Regions.
Release date: 9 March 2021
Released in Data Extraction ML packages & endpoints for HandwritingRecognition, DocumentClassifier, + Standalone Docker for UiPathDocumentOCR | v21.4
What's new
HandwritingRecognition with improved recognition using spelling corrections and ability to read machine-printed text reaches general availability.
DocumentClassifier reaches general availability as well.
Improvements on UiPathDocumentOCR for:
- Radio buttons/checkbox detection
- Accuracy on bubble forms
- General accuracy
Release date: 17 February 2021
Released in endpoints and Data Extraction ML packages | v21.1.8
Improvements
Improved accuracy.
InvoicesIndia and InvoicesAustralia are now generally available.
Deployed public endpoints in Australia Region.
https://du.uipath.com/ie/invoices
will work for both enterprise
and community traffic.
Release date: 18 December 2020
Released in Data Extraction ML packages | v20.11.3
Improvements
Improvements to CPU training to be faster and require less memory.
Date parsing improvements for non-US documents.
Checkbox recognition for UiPathDocumentOCR, including printed or handwritten checkboxes.
Release date: 10 November 2020
Released in endpoints and Data Extraction ML packages | v20.10.4
New features and improvements
A new model for Japanese Invoices.
Evaluation pipelines now return metrics for Classification fields too.
Support for Microsoft Read OCR version 3.
Improvements to date formatting/parsing for detecting day/month/year versus month/day/year formats.
Improvements to decimal point and thousands separators detections for correct number parsing.
Training on CPU is supported in all versions of AI Fabric.
id-no
.
Support for training Classification fields only (no Regular or Column fields).
Increased the maximum number of allowed fields from 32 to 40.
Report confidence levels for Column fields.
Known issues
class
, break
, from
,
finally
, global
, None
, etc.
Note that this list is not exhaustive since the package name is used for
class <pkg-name>
and import
<pkg-name>
.
- v24.11.3
- UiPath Document Understanding OCR
- v24.9.1
- UiPath Document Understanding OCR
- v24.7
- UiPathDocumentOCR
- v24.4.4
- Data Extraction
- v24.4.3
- DocumentUnderstanding and Data Extraction
- v24.4.2
- InvoicesIndia and endpoints
- v24.4.1
- DocumentUnderstanding, InvoicesJapan, and endpoints
- v24.4.0
- DocumentClassifier and Data Extraction
- v24.3.2
- DocumentUnderstandingOCR endpoints
- v24.2.1
- DocumentUnderstandingOCR endpoints
- v24.2.0
- Data Extraction
- Document Classifier
- v23.10.5
- UiPath Document Understanding OCR
- v23.10.4
- Data Extraction
- v23.10.3
- DocumentUnderstanding, Data Extraction, and endpoints
- v23.10.2
- DocumentUnderstanding and Data Extraction
- v23.10.0
- DocumentUnderstanding, Data Extraction, and endpoints
- UiPath Document Understanding OCR
- v23.7.0
- DocumentUnderstanding and Data Extraction
- v23.6.0
- DocumentUnderstanding and endpoints
- v23.4.1
- DocumentUnderstanding, Data Extraction, and endpoints
- v23.4.5
- DocumentUnderstanding
- v23.4.2
- DocumentUnderstanding
- v23.4.0
- DocumentUnderstanding, Data Extraction, and endpoints
- DocumentClassifier and endpoints
- v23.2.0
- DocumentUnderstanding, Data Extraction, and endpoints
- v23.1.0
- DocumentClassifier and endpoints
- v22.12.2
- Endpoints
- v22.11.0
- Document Understanding, Data Extraction, and endpoints
- v22.10.2
- Endpoints
- v22.10.0
- DocumentUnderstanding, Data Extraction, and endpoints
- UiPath DocumentOCR
- v22.6.1-preview
- DocumentUnderstanding, Data Extraction, and endpoints
- v22.6.0-preview
- DocumentUnderstanding and Data Extraction
- v22.5.2
- DocumentUnderstanding and Data Extraction
- v22.5.1
- DocumentUnderstanding, DocumentClassifier, and Data Extraction
- v22.5.0
- AI Center cloud, Data Extraction
- v22.4.3
- DocumentUnderstanding and Data Extraction
- v22.4.2
- DocumentUnderstanding, DocumentClassifier, and Data Extraction
- v22.4.1
- AI Center cloud, Data Extraction
- v22.4.0
- DocumentUnderstanding, DocumentClassifier, and Data Extraction
- v22.2.3
- UiPathDocumentUnderstandingOCR
- v22.1.6
- DocumentUnderstanding, DocumentClassifier, and Data Extraction
- v22.1.4
- DocumentUnderstanding, DocumentClassifier, and Data Extraction
- v21.10.11
- Data Extraction
- v21.10.9
- Data Extraction
- Data Extraction and endpoints
- v21.10.5
- UiPathDocumentOCR endpoints
- v21.10.1
- Data Extraction and endpoints for UiPathDocumentOCR
- v21.10
- FormExtractor, IntelligentFormExtractor, and IntelligentKeywordClassifier endpoints
- v21.7
- Data Extraction and endpoints for Handwriting Recognition
- v21.6.3
- UiPathDocumentOCR in endpoints
- v21.5.5
- Data Extraction and endpoints
- v21.5.3
- Data Extraction and endpoints
- v21.4.7
- Data Extraction and endpoints
- v21.4.5
- Data Extraction and endpoints
- v21.4
- Data Extraction and endpoints for HandwritingRecognition and DocumentClassifier
- v21.1.8
- Data Extraction and endpoints
- v20.11.3
- Data Extraction
- v20.10.4
- Data Extraction and endpoints