- 概述
- 入门指南
- 构建模型
- 使用模型
- 模型详细信息
- Public endpoints for Automation Cloud and Test Cloud
- Public endpoints for Automation Cloud and Test Cloud Public Sector
- 1040 - 文档类型
- 1040 计划 C - 文档类型
- 1040 计划 D - 文档类型
- 1040 计划 E - 文档类型
- 1040x - 文档类型
- 3949a - 文档类型
- 4506T - 文档类型
- 709 - 文档类型
- 941x - 文档类型
- 9465 - 文档类型
- ACORD125 - 文档类型
- ACORD126 - 文档类型
- ACORD131 - 文档类型
- ACORD140 - 文档类型
- ACORD25 - 文档类型
- 银行对账单 - 文档类型
- 提单 - 文档类型
- 公司注册证书 - 文档类型
- 原产地证书 - 文档类型
- 支票 - 文档类型
- 儿童产品证书 - 文档类型
- CMS 1500 - 文档类型
- 欧盟符合性声明 - 文档类型
- 财务报表 - 文档类型
- FM1003 - 文档类型
- I9 - 文档类型
- 身份证 - 文档类型
- 发票 - 文档类型
- 发票 2 - 文档类型
- 澳大利亚发票 - 文档类型
- 发票中国 - 文档类型
- 希伯来语发票 - 文档类型
- 发票印度 - 文档类型
- 日本发票 - 文档类别
- 发票运输 - 文档类型
- 装箱单列表 - 文档类型
- 工资单 - 文档类型
- 护照 - 文档类型
- 采购订单 - 文档类型
- 收据 - 文档类型
- 收据 2 - 文档类型
- 日本收据 - 文档类型
- 汇款通知书 - 文档类型
- UB04 - 文档类型
- 美国抵押贷款平交披露 - 文档类型
- 公用事业账单 - 文档类型
- 车辆标题 - 文档类型
- W2 - 文档类型
- W9 - 文档类型
- 支持的语言
- Insights 仪表板
- 数据与安全性
- 日志记录
- 许可
- 如何
- 故障排除

Document Understanding 用户指南
衡量
You can check the overall status of your project and check the areas with improvement potential from the Measure section.
项目衡量指标
The main measurement on the page is the overall Project score.
This measurement factors in the classifier and extractor scores for all document types. The score of each factor corresponds to the model rating and can be viewed in Classification Measure and Extraction Measure respectively.
模型评分是一项功能,旨在帮助您为分类模型的性能实现可视化。具体表现形式为 0 到 100 之间的模型分数,如下所示:
- 差 (0-49)
- 一般 (50-69)
- 良好 (70-89)
- 非常好 (90-100)
无论模型分数如何,您都可以根据项目需求决定何时停止训练。即使模型被评为“优秀”,也不意味着它将满足所有业务要求。
分类衡量标准
“分类”分数影响模型的性能以及数据集的大小和质量。
The Classification score is only available if you have more than one document type created.
If you select Classification, two tabs are displayed on the right side:
-
Factors: Provides recommendations on how to improve the performance of your model. You can get recommendations on dataset size or trained model performance for each document type.
-
Metrics: Provides useful metrics, such as the number of train and test documents, precision, accuracy, recall, and F1 score for each document type.

提取衡量指标
The Extraction score factors in the overall performance of the model as well as the size and quality of the dataset. This view is split into document types. You can also go straight to the Annotate view of each document type by selecting Annotate.
If you select any of the available document types from the Extraction view, three tabs are displayed on the right side:
-
Factors: Provides recommendations on how to improve the performance of your model. You can get recommendations on dataset size (number of uploaded documents, number of annotated documents) or trained model performance (fields accuracy) for the selected document type.
-
Dataset: Provides information about the documents used for training the model, the total number of imported pages, and the total number of labelled pages.
-
Metrics: Provides useful information and metrics, such as the field name, the number of training status, and accuracy for the selected document type. You can also access advanced metrics for your extraction models using the Download advanced metrics button. This feature allows you to download an Excel file with detailed metrics and model results per batch.

数据集诊断
The Dataset tab helps you build effective datasets by providing feedback and recommendations of the steps needed to achieve good accuracy for the trained model.

“管理”栏中显示了三个数据集状态级别:
- Red - More labelled training data is required.
- Orange - More labelled training data is recommended.
- Light green - Labelled training data is within recommendations.
- Dark green - Labelled training data is within recommendations. However, more data might be needed for underperforming fields.
如果会话中未创建任何字段,则数据集状态级别为灰色。
比较模型
You can compare the performance of two versions of a classification or extraction model from the Measure section.
分类模型比较
To compare the performance of two versions of a classification model, first navigate to the Measure section. Then, select Compare model for the classification model you are interested in.
您可以从每列顶部的下拉列表中选择要比较的版本。 系统默认选中左侧的当前版本(即最新可用版本),而右侧为最新发布的版本。
Figure 1. Classification model comparison

比较分类模型依赖于四个关键指标:
- 精度:正确预测的正实例与预测为正的实例总数的比率。精度高的模型意味着误报率较低。
- 准确性:正确预测的样本数(包括真正例和真负例)占样本总数的比率。
- 召回率:正确识别的真正例占比。
- F1 分数:精度和召回率的几何均值,旨在达到这两个指标的平衡。作用是在误报和漏报之间进行权衡。
The order of document types displayed is the one used in the latest version from the comparison. If a document type is not available in one of the compared versions, the values for each measure are replaced with N/A.
If a field was removed in the current version but it was available in the older version before the Compare model feature was available, the name is replaced with Unknown.
提取模型比较
To compare the performance of two versions of an extraction model, first navigate to the Measure section. Then, select Compare model for the extraction model you are interested in.
您可以从每列顶部的下拉列表中选择要比较的版本。 系统默认选中左侧的当前版本(即最新可用版本),而右侧为最新发布的版本。
Figure 2. Extraction model comparison

对提取模型的比较有赖于以下重要指标:
- 字段名称:标注字段的名称。
- 内容类型:字段的内容类型:
- 字符串
- 数字
- 日期
- 电话
- ID 编号
- 评分:模型分数,旨在帮助您可视化所提取字段的表现。
- 准确度:模型做出的正确预测在预测总数中所占比例。
The order of field names displayed is the one used in the latest version from the comparison. If a field name is not available in one of the compared versions, the values for each measure are replaced with N/A.
If a field was removed in the current version but it was available in the older version before the Compare model feature was available, the name is replaced with Unknown.
You can also compare the field score for tables from the Table section.
You can download the advanced metrics file for each version from the comparison page from the Download advanced metrics button.