document-understanding
latest
false
重要 :
新发布内容的本地化可能需要 1-2 周的时间才能完成。

Document Understanding API 指南
上次更新日期 2026年2月24日
从Document Understanding API v1 升级到 v2时,适用以下重大变更。某些更新需要执行操作,以确保您的自动化继续按预期运行。
建议采取以下步骤将自动化从Document Understanding API v1 迁移到 v2 :
- 更新路由路径。
- 更新 IXP 提取结果处理(从“表格”更新为“字段组”)。
- 重新构建并重新部署到 non-production 环境。
- 验证自动化是否在所有四个维度上按预期运行:
- 发现、
- 分类,
- 提取,
- 验证。
基于标签的路由已规范化,以提高 API 的一致性。
在 v2 中,所有使用先前路径结构的基于标签的端点都会返回400 错误请求。
此更改会影响
{projectId}后直接包含{tag}所有操作。
v1
POST /projects/{projectId}/{tag}/classification
POST /projects/{projectId}/{tag}/classification
v2
POST /projects/{projectId}/tags/{tag}/classificationPOST /projects/{projectId}/tags/{tag}/classification需要执行的操作
在代码库中搜索以下内容:
/projects/{projectId}/{tag}/替换为:
/projects/{projectId}/tags/{tag}/确保在所有环境中一致应用此更新。
不再返回
fields属性。任何引用字段的反序列化逻辑或强类型模型都将失败或返回空值。
v1
{
"fields": [
{
"name": "InvoiceNumber",
"type": "string"
}
]
}
{
"fields": [
{
"name": "InvoiceNumber",
"type": "string"
}
]
}
{
"taxonomy": {
...
}
}
{
"taxonomy": {
...
}
}
- 更新您的响应模型以使用
taxonomy对象。 - 重构以前依赖于
fields下游逻辑。
如果您的模型需要使用单个
tag属性,则响应解析可能会失败。如果您使用的是严格架构验证,则需要更新。
v1
{
"tag": "staging"
}
{
"tag": "staging"
}
{
"tags": ["staging"]
}
{
"tags": ["staging"]
}
需要执行的操作:
- 更新响应模型以将
tag替换为tags。 - 调整逻辑如果采用单个标签值。
此更改仅影响 IXP 提取结果。
在 v1 中,API 以一个或多个表格的形式返回 IXP 提取结果。这是从 IXP 的“字段组”概念到表格的映射。这些表中的所有值都表示为文本(字符串),无论其原始 IXP 数据类型如何。
在 v2 中,API 将 IXP 提取结果作为字段组返回。这引入了与 IXP 字段组概念的一对一映射。每个字段都保留其实际的 IXP 数据类型(例如, Text 、 Number 、 Date 、 MonetaryQuantity )。
v1(IXP 提取结果以表格形式返回;以文本形式表示的值)
{
"Tables": [
{
"FieldId": "Seller",
"FieldName": "Seller",
"IsMissing": false,
"DataSource": "Automatic",
"DataVersion": 0,
"OperatorConfirmed": false,
"Values": [
{
"OperatorConfirmed": true,
"Confidence": 0.9999834,
"OcrConfidence": 1.0,
"Cells": [
{
"RowIndex": 0,
"ColumnIndex": 0,
"IsHeader": true,
"IsMissing": false,
"OperatorConfirmed": false,
"DataSource": "Automatic",
"DataVersion": 0,
"Values": [
{
"Components": [],
"Value": "Name",
"UnformattedValue": "Name",
"Reference": {
"TextStartIndex": 0,
"TextLength": 0,
"Tokens": []
},
"DerivedFields": [],
"Confidence": -1.0,
"OperatorConfirmed": false,
"OcrConfidence": 1.0,
"TextType": "Unknown"
}
]
}
],
"ColumnInfo": [
{
"FieldId": "Name",
"FieldName": "Name",
"FieldType": "Text"
}
],
"NumberOfRows": 2
}
]
}
]
}{
"Tables": [
{
"FieldId": "Seller",
"FieldName": "Seller",
"IsMissing": false,
"DataSource": "Automatic",
"DataVersion": 0,
"OperatorConfirmed": false,
"Values": [
{
"OperatorConfirmed": true,
"Confidence": 0.9999834,
"OcrConfidence": 1.0,
"Cells": [
{
"RowIndex": 0,
"ColumnIndex": 0,
"IsHeader": true,
"IsMissing": false,
"OperatorConfirmed": false,
"DataSource": "Automatic",
"DataVersion": 0,
"Values": [
{
"Components": [],
"Value": "Name",
"UnformattedValue": "Name",
"Reference": {
"TextStartIndex": 0,
"TextLength": 0,
"Tokens": []
},
"DerivedFields": [],
"Confidence": -1.0,
"OperatorConfirmed": false,
"OcrConfidence": 1.0,
"TextType": "Unknown"
}
]
}
],
"ColumnInfo": [
{
"FieldId": "Name",
"FieldName": "Name",
"FieldType": "Text"
}
],
"NumberOfRows": 2
}
]
}
]
}{
"Fields": [
{
"FieldId": "Default.Seller",
"FieldName": "Seller",
"FieldType": "FieldGroup",
"IsMissing": false,
"DataSource": "Automatic",
"Values": [
{
"Components": [
{
"FieldId": "Default.Seller.Name",
"FieldName": "Name",
"FieldType": "Text",
"IsMissing": false,
"DataSource": "Automatic",
"Values": [
{
"Components": [],
"Value": "John Doe",
"UnformattedValue": "John Doe",
"Reference": {
"TextStartIndex": 0,
"TextLength": 8,
"Tokens": [
"..."
]
},
"DerivedFields": [],
"Confidence": 0.9999834,
"OperatorConfirmed": false,
"OcrConfidence": 0.90999997,
"TextType": "Text",
"ValidatorNotes": "",
"ValidatorNotesInfo": ""
}
]
}
]
}
]
}
]
}{
"Fields": [
{
"FieldId": "Default.Seller",
"FieldName": "Seller",
"FieldType": "FieldGroup",
"IsMissing": false,
"DataSource": "Automatic",
"Values": [
{
"Components": [
{
"FieldId": "Default.Seller.Name",
"FieldName": "Name",
"FieldType": "Text",
"IsMissing": false,
"DataSource": "Automatic",
"Values": [
{
"Components": [],
"Value": "John Doe",
"UnformattedValue": "John Doe",
"Reference": {
"TextStartIndex": 0,
"TextLength": 8,
"Tokens": [
"..."
]
},
"DerivedFields": [],
"Confidence": 0.9999834,
"OperatorConfirmed": false,
"OcrConfidence": 0.90999997,
"TextType": "Text",
"ValidatorNotes": "",
"ValidatorNotesInfo": ""
}
]
}
]
}
]
}
]
}- 为方便起见,在 v1 中,IXP“类似表格”的结果在“字段”数组中表示为 FieldType.Table ,并映射到 表格 结构。
- 在v2 版本中,IXP 结果以字段类型.字段形式表示,并以字段形式返回(与 IXP 字段形式一:1)。任何需要FieldType.Table或表格的逻辑都将中断。
需要执行的操作
- 更新 IXP 提取结果处理方式,以使用字段组而非表格。
- 如果您的自动化将 IXP 提取结果视为表格,请更新解析逻辑以处理新的字段组结构和输入的字段。
- 将基于字符串的解析替换为类型感知处理。例如:
- 日期:解析为日期值
- 数字:解析为数值
- MonetaryQuantity:将值和货币作为单个数据对象处理
{
"Fields": [
{
"FieldId": "Seller",
"FieldName": "Seller",
"FieldType": "Table",
"IsMissing": false,
"DataSource": "Automatic",
"Values": [
{
"Components": [
{
"FieldId": "Seller.Header",
"FieldName": "Header",
"FieldType": "Internal",
"IsMissing": false,
"DataSource": "Automatic",
"Values": [],
"DataVersion": 0,
"OperatorConfirmed": false,
"ValidatorNotes": ""
},
{
"FieldId": "Seller.Body",
"FieldName": "Body",
"FieldType": "Internal",
"IsMissing": false,
"DataSource": "Automatic",
"Values": [],
"DataVersion": 0,
"OperatorConfirmed": false,
"ValidatorNotes": ""
}
],
"Value": "",
"UnformattedValue": "",
"Reference": {
"TextStartIndex": 0,
"TextLength": 0,
"Tokens": []
},
"DerivedFields": [],
"Confidence": 0.9999834,
"OperatorConfirmed": true,
"OcrConfidence": 1.0,
"TextType": "Unknown"
}
],
"DataVersion": 0,
"OperatorConfirmed": false,
"ValidatorNotes": ""
}
]
}{
"Fields": [
{
"FieldId": "Seller",
"FieldName": "Seller",
"FieldType": "Table",
"IsMissing": false,
"DataSource": "Automatic",
"Values": [
{
"Components": [
{
"FieldId": "Seller.Header",
"FieldName": "Header",
"FieldType": "Internal",
"IsMissing": false,
"DataSource": "Automatic",
"Values": [],
"DataVersion": 0,
"OperatorConfirmed": false,
"ValidatorNotes": ""
},
{
"FieldId": "Seller.Body",
"FieldName": "Body",
"FieldType": "Internal",
"IsMissing": false,
"DataSource": "Automatic",
"Values": [],
"DataVersion": 0,
"OperatorConfirmed": false,
"ValidatorNotes": ""
}
],
"Value": "",
"UnformattedValue": "",
"Reference": {
"TextStartIndex": 0,
"TextLength": 0,
"Tokens": []
},
"DerivedFields": [],
"Confidence": 0.9999834,
"OperatorConfirmed": true,
"OcrConfidence": 1.0,
"TextType": "Unknown"
}
],
"DataVersion": 0,
"OperatorConfirmed": false,
"ValidatorNotes": ""
}
]
}