- 概述
- 入门指南
- 构建模型
- 使用模型
- 模型详细信息
- 公共端点
- 1040 - 文档类型
- 1040 计划 C - 文档类型
- 1040 计划 D - 文档类型
- 1040 计划 E - 文档类型
- 1040x - 文档类型
- 3949a - 文档类型
- 4506T - 文档类型
- 709 - 文档类型
- 941x - 文档类型
- 9465 - 文档类型
- ACORD125 - 文档类型
- ACORD126 - 文档类型
- ACORD131 - 文档类型
- ACORD140 - 文档类型
- ACORD25 - 文档类型
- 银行对账单 - 文档类型
- 提单 - 文档类型
- 公司注册证书 - 文档类型
- 原产地证书 - 文档类型
- 支票 - 文档类型
- 儿童产品证书 - 文档类型
- CMS 1500 - 文档类型
- 欧盟符合性声明 - 文档类型
- 财务报表 - 文档类型
- FM1003 - 文档类型
- I9 - 文档类型
- 身份证 - 文档类型
- 发票 - 文档类型
- 发票 2 - 文档类型
- 澳大利亚发票 - 文档类型
- 发票中国 - 文档类型
- 希伯来语发票 - 文档类型
- 发票印度 - 文档类型
- 日本发票 - 文档类别
- 发票运输 - 文档类型
- 装箱单列表 - 文档类型
- 工资单 - 文档类型
- 护照 - 文档类型
- 采购订单 - 文档类型
- 收据 - 文档类型
- 收据 2 - 文档类型
- 日本收据 - 文档类型
- 汇款通知书 - 文档类型
- UB04 - 文档类型
- 美国抵押贷款平交披露 - 文档类型
- 公用事业账单 - 文档类型
- 车辆标题 - 文档类型
- W2 - 文档类型
- W9 - 文档类型
- 支持的语言
- Insights 仪表板
- 部署在 Automation Suite 中的 Document Understanding
- 日志记录
- 许可和计费逻辑
- 如何
- 故障排除

Document Understanding modern projects user guide
标注文档
After successfully creating your project and uploading your documents to a specific document type, they are automatically pre-annotated. This is done using specialized models, based on the document type's schema. The schema clearly defines the fields you want to extract from a particular document type. To find the document type's schema, go to the Annotation page and check the Fields section.

预测在文档中的文本上显示为下划线,无法删除。如果这些信息不正确且无法与特定字段匹配,您可以忽略它们。在训练流程期间,只有确认的字段会被用于训练,而不会考虑下划线。
随着您继续添加更多标注,预测下划线应会逐渐与您的输入对齐。开头下划线和用户标注字段之间可能存在一些不一致。但是,随着您创建的标注数量增加以及模型的改进,下划线与用户提供的数据之间的排列应该会更加精确。
在以下图像中,送货地址被错误预测为包含人员姓名。

要解决此问题,您只需确认收货地址。无需删除与名称相关的带下划线的文本。当您继续批注并更正此类错误时,带下划线的文本与已确认的字段不一致的情况应该会减少。
要触发模型训练,至少需要 40 次操作。例如,如果您有 20 个文档,则每个文档至少需要批注 2 个字段,总共需要 40 次操作。
To optimize model perfomance, follow the suggestions from the Recommendations section. These suggestions are designed to improve the overall performance of your model.

验证预测的文档
After all documents are uploaded and predicted, your goal is to either validate or modify the pre-annotated fields. For a document where all fields are accurately predicted, select Confirm to approve all fields at once. A document, once confirmed, will be signified with a green shield symbol in the document list.

If a document is only partially confirmed, it will be marked with an empty shield symbol in the document list. This symbolizes that the annotation process for this particular document is In Progress. Your end aim should be to make sure that all documents are Confirmed.
在验证期间,您可能会遇到以下情况:
- 预测正确,应进行验证。
- 预测不正确,该字段存在于文档中。
- 预测不正确,文档中缺少该字段。
- 没有预测。
预测正确,应进行验证
If the prediction is accurate, you can confirm it by either selecting the underlined text and selecting Confirm or checking the confirmation checkbox for the field. The optimal method, however, is to press the hotkey assigned to the field (“N” in this scenario).

预测不正确,且该字段位于文档中
If the prediction is incorrect, select the correct text from the document and the appropriate field from the dropdown, then select Confirm.
当处理表格时,您可以选择忽略错误预测的值。这些值将不会用于模型训练,而重新训练的模型将学会在后续迭代中避免预测这些值。
预测不正确,文档中缺失该字段
If there prediction is incorrect and the field is missing from the document, select the three-dot icon ⁝ next to the field name and select Mark as missing.
You can also mark wrong fields as missing. For example, if you do not have a Vendor Address in your document but during processing a different field was pre-labeled as Vendor Address, you can just mark it as missing during validation.

无预测
Fields that have no prediction are displayed as empty cells. You can mark these cells as missing one by one, or in bulk by selecting the Confirm button.
文档类型设置
You can change the document type settings from the Annotate view.
To do so, select the three-dot icon ⁝ on the right side of the document type name and select Settings.

您可以更改以下设置:
- Base model: Dataset size estimations used in the Recommended Actions depend on the base model used to train. Using the most similar base model to your Document Type will reduce the amount of annotation work required.
- Number of languages: Dataset size estimation used in the Recommended Actions depend on the number of languages in the dataset. More languages generally require annotating more data.