- 概述
- 入门指南
- 构建模型
- 使用模型
- 模型详细信息
- Public endpoints for Automation Cloud and Test Cloud
- Public endpoints for Automation Cloud and Test Cloud Public Sector
- 1040 - 文档类型
- 1040 计划 C - 文档类型
- 1040 计划 D - 文档类型
- 1040 计划 E - 文档类型
- 1040x - 文档类型
- 3949a - 文档类型
- 4506T - 文档类型
- 709 - 文档类型
- 941x - 文档类型
- 9465 - 文档类型
- ACORD125 - 文档类型
- ACORD126 - 文档类型
- ACORD131 - 文档类型
- ACORD140 - 文档类型
- ACORD25 - 文档类型
- 银行对账单 - 文档类型
- 提单 - 文档类型
- 公司注册证书 - 文档类型
- 原产地证书 - 文档类型
- 支票 - 文档类型
- 儿童产品证书 - 文档类型
- CMS 1500 - 文档类型
- 欧盟符合性声明 - 文档类型
- 财务报表 - 文档类型
- FM1003 - 文档类型
- I9 - 文档类型
- 身份证 - 文档类型
- 发票 - 文档类型
- 发票 2 - 文档类型
- 澳大利亚发票 - 文档类型
- 发票中国 - 文档类型
- 希伯来语发票 - 文档类型
- 发票印度 - 文档类型
- 日本发票 - 文档类别
- 发票运输 - 文档类型
- 装箱单列表 - 文档类型
- 工资单 - 文档类型
- 护照 - 文档类型
- 采购订单 - 文档类型
- 收据 - 文档类型
- 收据 2 - 文档类型
- 日本收据 - 文档类型
- 汇款通知书 - 文档类型
- UB04 - 文档类型
- 美国抵押贷款平交披露 - 文档类型
- 公用事业账单 - 文档类型
- 车辆标题 - 文档类型
- W2 - 文档类型
- W9 - 文档类型
- 支持的语言
- Insights 仪表板
- 数据与安全性
- 日志记录
- 许可
- 如何
- 故障排除

Document Understanding 用户指南
构建
本部分会介绍以下体验:
- 上传文档并自动对其进行分类。
- 直接将文档上传到文档类型中。
- 管理项目中的文件(添加、删除文件以及添加、更改标签)。
- 标注文档。
- 添加或删除字段。
- 获得有关使用建议训练分类和提取模型的指导性体验。
标注文档
After successfully creating your project and uploading your documents to a specific document type, they are automatically pre-annotated. This is done using a combination of generative and specialized models, based on the document type's schema. The schema clearly defines the fields you want to extract from a particular document type. To find the document type's schema, go to the Annotation page and check the Fields section.

For more in-depth information on how to annotate your documents, check the Annotate documents how-to page.
待审核的异常
您可以使用在验证站点中验证的文档,进一步提高模型的性能。
If there are any changes after the validation step, the Exceptions for review button is displayed for the impacted document type.
Figure 1. Exceptions for review button

For more in-depth information on how to retrain your models, check the Retrain extractors how-to page.
为文档添加标签
上传文档后,您可以为其添加标签。
您可以为每个文档添加一个最多包含 100 个字符的标签。
To add a a tag to your documents, select the documents you want to add and select the Tags button from the menu above the document types list.

如果使用标签进行筛选,您可以更轻松地搜索文档。训练模型时,您还可以在高级配置文件中根据标签查看结果。
文档类型管理
You can edit the settings for multiple fields from Document type manager.
To get to there, select the three-dot icon ⋮ next to the document type you want to edit and select Document type manager from the menu.
Figure 2. Select Document type manager

Recommendations in Document Understanding are displayed only when the user has sufficient permissions to perform the action suggested by the recommendation. If you do not have permissions to execute the recommended actions, you will see a message indicating insufficient access. Users with the Document Understanding Developer, Document Understanding Administrator, and Document Understanding Project Administrator roles can view all available recommendations. The Project Administrator role applies these permissions at the project level only.
提取字段
编辑或添加新字段
To add a new field, select Add field and fill in the needed information. You can add or edit the following options for each field:
-
Field name: the unique name for the field.
-
Content type: the content type of the field:
- String: used for company names or addresses, as well as payment terms, or for any other field where you want to build the parsing or formatting logic manually, in the RPA workflow.
- Number: used for amounts or quantities, with intelligent parsing of the decimal/thousands separators.
- Date: parse, format and unify the output using the YYYY-MM-DD format.
- Phone: use for phone number. Formatting removes letters and parentheses, and replaces spaces with dashes.
- ID Number: used for alphanumeric codes, numbers of IDs. It's similar to the string content type, but removes any characters coming before the
:character. If the Id number you need to extract can contain:characters, usestringcontent type instead to avoid data loss.
-
Shortcut: the shortcut key for the field. One key or a combination of two keys is allowed.
-
Advanced settings: the available options differ depending on the Content type of the selected field. Select the Advanced settings button for the desired field to edit: Figure 3. Document type advanced settings

- Field ID: the unique id for the field.
- Post processing:
- first_span: if the model predicts more than one instance of a field in a document, make it return the first one.
- longest_value: if the model predicts more than one instance of a field in a document, make it return the value consisting of the largest number of characters.
- highest_confidence: if the model predicts more than one instance of a field in a document, make it return the value with the highest confidence.Scoring: the measure used to determine the accuracy when running evaluations of model predictions is only available for fields with content type String:
- exact_match: prediction will only be deemed to be correct (score of 1) if it exactly matches the true value. If it differs by even a single character, then it is deemed to be incorrect (score of 0). This is the default setting for all fields except for String fields.
- levenshtein: prediction will be deemed to be partially correct according to the Levenshtein distance between the prediction and the true value. For example, if a 10 letter value is predicted correctly except for the last 2 characters, then the score of that prediction is be 0.8.
- Date format: this field is only available for fields with content type Date and it indicates how ambiguous dates are parsed and returned:
- 自动
- US style: YYYY-DD-MM
- Non-US style: YYYY-MM-DD
- Multi-line: fields which span multiple text lines (addresses or descriptions) need to have this checked, otherwise only the first line is returned.
- Multi-value: field returns a list with all the values detected in the document.
您也可以从此视图中对字段重新排序。
如果您在重新触发训练之前发布新的项目版本,则文档类型设置中的更改不会反映在新的项目版本中。
Workaround: To avoid this, retrain the document type after making modifications to the document type fields. You can do this by tagging or confirming additional documents for that type before publishing a new version.
搜索字段名称
You can search through the available field names. To do so, use the search bar from the top left corner of the Document type manager interface. For a more efficient search, use the Filter feature to filter by Content type.
Figure 4. Search field names

删除字段
Select the Delete button next to the field you want to delete.
Figure 5. Delete a field

You can also select several (or all) fields and delete them at once. To do so, select the check mark next to the fields you want to delete and then select Delete.
Figure 6. Delete several fields at once

分类字段
分类字段是引用整个文档的数据点。例如,收据的费用类型(食品、酒店、航空或交通)或发票的币种(美元、欧元、日元)是分类字段。
以下限制目前适用于“分类”字段功能:
- 使用“提取文档数据”活动时,分类字段支持新式项目提取程序和开箱即用的模型,但不支持传统项目提取程序。
- 只有训练成功后,才会为自定义文档类型提取分类字段。
编辑或添加分类字段
To add a new classification field, select Add field and type in a name for the new field.
您也可以从此视图中对字段重新排序。
Figure 7. Add a new classification field

To check the classification field ID, select Advanced settings next to the needed classification field.
Figure 8. Classification fields advanced settings

编辑或添加类
To add a new class for a classification field, select Add class and type in a class name and an optional description.
Each classification field must contain at least two classes.
Figure 9. Add a new class

您可以编辑每个类的名称和描述。
您还可以从此视图重新排序类。
To remove a class, select Delete next to the class you want to remove.
Figure 10. Delete a class

设置
You can change the document type settings from the Settings tab.
Figure 11. Model settings

您可以更改以下设置:
- Base model: Dataset size estimations used in the Recommended Actions depend on the base model used to train. Using the most similar base model to your Document Type will reduce the amount of annotation work required.
- Number of languages: Dataset size estimation used in the Recommended Actions depend on the number of languages in the dataset. More languages generally require annotating more data.
搜索文档
You can search uploaded documents by document name. To do so, use the search bar from the left corner of the Build section. For a more efficient search, use the Filter feature to filter by:
- 文档类型:从下拉列表中选择所需的文档类型。
- 上传日期:选择上传文档的日期间隔。
- 状态:选择文档的状态。
- 标签:选择要筛选的标签。
Figure 12. Filter documents

项目和模型分数
You can check your project's overall score from the top right corner. This score factors in the classifier and extractor scores for all document types. Select Project score to display the Measure section. You can check more in-depth performance measurements in that section.
您可以在“文档类型”部分单独查看每种文档类型的分数。此分数会影响模型的整体性能以及数据集的大小和质量。
You need to upload at least 10 documents to get a project score. For a document type score, you need at least 10 documents under the same document type.

如果选择分数标签,则可以查看模型的模型评分。模型评分是一项功能,旨在帮助您为分类模型的性能实现可视化。具体表现形式为 0 到 100 之间的模型分数,如下所示:
- 差 (0-49)
- 一般 (50-69)
- 良好 (70-89)
- 非常好 (90-100)
Select Detailed model scores to go to the Measure section for detailed information.
