Document Understanding Release Notes
2022.4.0
Release date Document Manager On-Premises: 23 May 2022
Stay up to date with all the latest news regarding Document Manager by going through the next list of changes that have occurred since the last LTS release until now.
Data Manager changes its name to Document Manager.
A new option is available, allowing you to permanently delete individual files. The option can be found in the drop down that also contains the download option.
You now have the option to rename previously created fields.
Search inside a document is now possible, allowing you to search for words in your current document.
Data gathered from importing a dataset is now integrated into the JSON files of the subset field, meaning that if you manually modify the file or delete it completely from the dataset, it does not have an impact over the training of the model.
deleted
keyword.
Document view has received new getting started tips.
EXEC sp_fulltext_service 'restart_all_fdhosts'
command by a DBA with the appropriate permissions on the server.
When using the Predict functionality along with Document Manager, tagged data that was not manually edited by the user is replaced with the values received from the model.
Added more descriptive tooltips on Training, Validation, and Evaluation document types.
The edit field dialog box for column and regular fields has been restructured. Post processing, Multi page, Scoring, and Color options have been moved to the Advanced tab. The rest of the options can be found in the General tab.
Speed improvement on import for duplicate documents.
Classification fields now appear in the order they're created.
- Fixed a known issue that was causing the search or the download of a document which contained characters that require URL
encoding (
&
,,
,+
,#
,'
) in its file name to fail with an invalid query. - Fixed a bug that caused the Predict functionality to fail on documents with very dense text.
- Removed the 2000 documents import limit per session. Now you can have more than 2000 documents in a session, considering the limit of 2000 pages per import.
- Fixed a bug that was not allowing you to select more than 3 boxes when pressing
ctrl
orshift
. - Fixed a bug that caused an import to hang in processing until being timed out after the pod was restarted but the job did not resume.
- Fixed a bug that prevented the Predict function from extracting data from the entire document. Please note that the 10 page limit when using the function with public endpoints is still in place.
- Fixed a bug for Microsoft Read OCR where endpoints matching the
*.cognitiveservices.azure.com
subdomains were throwing anOCR endpoint is not valid
error. - Fixed a bug that caused the Document Manager dataset import to mix up pages on documents with more than 10 pages.
- Fixed a bug that caused the download or the export of an empty dataset or only a small subset of the full dataset when the All labelled option was selected.
- Maximum import size decreased from 2GB or 2000 pages to 1GB or 2000 pages.
- Searching or downloading a document containing characters that require URL encoding (
&
,,
,+
,#
,'
) in its file name fails withinvalid query
.
For more details about all changes that occurred in Document Manager, please consult the previous Release Notes.
- Some PDF files that contain Type3 fonts can result in high memory usage for the Digitizer Service. When this occurs, import operations from Document Manager are degraded. The mitigation for this is to manually delete the kubernetes pods that have high memory usage (constantly above 70%).