Document Understanding - 2021 年 10 月

document-understanding

latest

false

Document Understanding 发行说明

适用于 Automation Cloud 和 Test Cloud 的 Document Understanding
适用于 Automation Cloud 和 Test Cloud 公共部门的 Document Understanding
- 2025
- 2024
Document Understanding for Automation Cloud 和 Test Cloud 专用版
- 2025
ML 包和公共端点
- 常规 ML 包和公共端点更新
- ML 包和公共端点版本历史记录

重要 :

新发布内容的本地化可能需要 1-2 周的时间才能完成。

2021 年 10 月

Automation Cloud 中 Document Understanding 的 2021 年 10 月发布说明。

General Release Notes - Document Understanding

2021 年 10 月 19 日

改进

带有标签的少于 10 个文档的字段可以删除，无需确认。

错误修复

修复了影响已导入的同名文件的错误。
修复了 Google OCR 中的一个错误，该错误在具有空页面的文档上引发错误。
修复了在“验证站点”或“Data Manager”数据集导入的“导入数据”对话框中错误显示文件计数的错误。

已知问题

默认导出（文档级别）仅适用于 AI Center 中的 21.10 或更高版本的 ML 包。版本将显示在 AI Center 的 ML 包视图的“更改日志”列中。对于旧版本，请使用“导出文件”对话框中的“向后兼容导出”复选框。

2021 年 10 月 1 日

多页文档支持

Data Manager 现在支持多页文档。这是一个重大更新，会影响机器学习流程的各个方面：

Import: you can upload documents up to 150 pages; to bypass this limit, at the risk of an unstable labeling experience, select the Enable large documents checkbox from the Import data dialog box.

Prelabeling: the document is prelabeled as a whole, producing the same results as running in RPA workflow, but it takes more time in case of larger documents. See also Known Issues below.

Labeling: more convenient labeling due to natural scrolling through document pages.

Export: done by default at document level. Should you want to export the documents at page level, select the Backwards-compatible export checkbox from the Export files dialog box; this is also recommended if the model accuracy produced by the default export is below expectations.

Training: on most scenarios, the models trained with the new document level exported datasets should have the same performance with the page level Backwards-compatible export. However, if the models perform below expectations, we recommend that you retry the training using a Backwards-compatible export as well, in case it might produce better results.

Evaluation: this is the main motivation for the multi-page document support feature, since Evaluations scores will more accurately reflect run time performance. Please note that this assumes that each multi-page document contains a single logical document. For instance, if you import 20 page file packets containing 10 invoices of 2 pages each, then this should not be used as part of Evaluation sets. However, they can be used as part of Training sets but only if you export using the Backwards-compatible option enabled.

改进

使用“导出文件”对话框中的单选按钮导出架构支持。

最大导入大小增加到 2GB 或 2000 页。

“测试集”已重命名为“评估集”，以与 AI Center 评估管道保持一致。

“预测”按钮默认显示在管理栏中，但需要配置“预加标签”设置才能启用该按钮。

从评估集导出的内容中删除的每字段样本数的所有限制。

在管理栏中的文件名旁边添加了“Data Manager”会话名称，以便在同时打开多个“Data Manager”选项卡时更轻松地标识正在处理的会话。

支持中文文档。

无障碍功能改进。

葡萄牙语（葡萄牙）、俄语和土耳其语的本地化。

已知问题

“中国发票”模式不会以标准 yyyy-mm-dd 格式设置中文样式日期的格式。以后的版本中将对此进行改进。
Data Manager 对日期的解析与运行时 ML 模型作出的解析不一致。如果您发现 Data Manager 中的日期解析错误，则很可能在运行时的模型预测中会正确解析这些日期。这是一个已知问题，将在即将发布的补丁中解决。
目前，将“预测”选项与公共端点一起使用时，只能预标记文档的前 10 页。这是一个已知问题，即将发布的补丁中将包含增强功能。但是，在 AI Center 中将“预测”选项与 ML 技能一起使用并不会施加这样的限制。

在此页面上

General Release Notes - Document Understanding
2021 年 10 月 19 日
2021 年 10 月 1 日

此页面有帮助吗？

前一个2021 年 11 月

下一个2021 年 8 月

General Release Notes - Document Understanding​

2021 年 10 月 19 日​

改进​

错误修复​

已知问题​

2021 年 10 月 1 日​

多页文档支持​

改进​

已知问题​

此页面有帮助吗？

General Release Notes - Document Understanding

2021 年 10 月 19 日

改进

错误修复

已知问题

2021 年 10 月 1 日

多页文档支持

改进

已知问题