UiPath Documentation
document-understanding
2024.10
false
重要 :
新发布内容的本地化可能需要 1-2 周的时间才能完成。
UiPath logo, featuring letters U and I in white

Document Understanding modern projects user guide

上次更新日期 2026年4月6日

标注文档

After successfully creating your project and uploading your documents to a specific document type, they are automatically pre-annotated. This is done using specialized models, based on the document type's schema. The schema clearly defines the fields you want to extract from a particular document type. To find the document type's schema, go to the Annotation page and check the Fields section.

“字段”菜单的屏幕截图。

预测在文档中的文本上显示为下划线,无法删除。如果这些信息不正确且无法与特定字段匹配,您可以忽略它们。在训练流程期间,只有确认的字段会被用于训练,而不会考虑下划线。

随着您继续添加更多标注,预测下划线应会逐渐与您的输入对齐。开头下划线和用户标注字段之间可能存在一些不一致。但是,随着您创建的标注数量增加以及模型的改进,下划线与用户提供的数据之间的排列应该会更加精确。

在以下图像中,送货地址被错误预测为包含人员姓名。

“字段名称”菜单的屏幕截图。

要解决此问题,您只需确认收货地址。无需删除与名称相关的带下划线的文本。当您继续批注并更正此类错误时,带下划线的文本与已确认的字段不一致的情况应该会减少。

备注:

要触发模型训练,至少需要 40 次操作。例如,如果您有 20 个文档,则每个文档至少需要批注 2 个字段,总共需要 40 次操作。

提示:

To optimize model perfomance, follow the suggestions from the Recommendations section. These suggestions are designed to improve the overall performance of your model.

“批注”页面的屏幕截图。

验证预测的文档

After all documents are uploaded and predicted, your goal is to either validate or modify the pre-annotated fields. For a document where all fields are accurately predicted, select Confirm to approve all fields at once. A document, once confirmed, will be signified with a green shield symbol in the document list.

“批注”页面的屏幕截图。

If a document is only partially confirmed, it will be marked with an empty shield symbol in the document list. This symbolizes that the annotation process for this particular document is In Progress. Your end aim should be to make sure that all documents are Confirmed.

在验证期间,您可能会遇到以下情况:

  • 预测正确,应进行验证。
  • 预测不正确,该字段存在于文档中。
  • 预测不正确,文档中缺少该字段。
  • 没有预测。

预测正确,应进行验证

If the prediction is accurate, you can confirm it by either selecting the underlined text and selecting Confirm or checking the confirmation checkbox for the field. The optimal method, however, is to press the hotkey assigned to the field (“N” in this scenario).

“批注”页面的屏幕截图。

预测不正确,且该字段位于文档中

If the prediction is incorrect, select the correct text from the document and the appropriate field from the dropdown, then select Confirm.

当处理表格时,您可以选择忽略错误预测的值。这些值将不会用于模型训练,而重新训练的模型将学会在后续迭代中避免预测这些值。

预测不正确,文档中缺失该字段

If there prediction is incorrect and the field is missing from the document, select the three-dot icon next to the field name and select Mark as missing.

重要提示:

You can also mark wrong fields as missing. For example, if you do not have a Vendor Address in your document but during processing a different field was pre-labeled as Vendor Address, you can just mark it as missing during validation.

“批注”页面的屏幕截图。

无预测

Fields that have no prediction are displayed as empty cells. You can mark these cells as missing one by one, or in bulk by selecting the Confirm button.

文档类型设置

You can change the document type settings from the Annotate view.

To do so, select the three-dot icon on the right side of the document type name and select Settings.

“设置”按钮的屏幕截图。

您可以更改以下设置:

  • Base model: Dataset size estimations used in the Recommended Actions depend on the base model used to train. Using the most similar base model to your Document Type will reduce the amount of annotation work required.
  • Number of languages: Dataset size estimation used in the Recommended Actions depend on the number of languages in the dataset. More languages generally require annotating more data.

此页面有帮助吗?

连接

需要帮助? 支持

想要了解详细内容? UiPath Academy

有问题? UiPath 论坛

保持更新