
Document Understanding API ガイド
When upgrading from Document Understanding API v1 to v2, the following breaking changes apply. Some updates require action to ensure your automations continue to work as expected.
It is recommended to take the following steps to migrate your automations from Document Understanding API v1 to v2:
- Update route paths.
- Update IXP extraction result handling (from Tables to FieldGroups).
- Rebuild and redeploy to a non-production environment.
- Validate that the automation works as expected in all four dimensions:
- Discovery,
- Classification,
- Extraction,
- Validation.
Tag-based routes have been normalized to improve consistency across the API.
All tag-based endpoints using the previous path structure return 400 Bad Request in v2.
{tag} directly after {projectId}.
v1
POST /projects/{projectId}/{tag}/classification
POST /projects/{projectId}/{tag}/classification
v2
POST /projects/{projectId}/tags/{tag}/classificationPOST /projects/{projectId}/tags/{tag}/classification必要なアクション
/projects/{projectId}/{tag}//projects/{projectId}/tags/{tag}/Ensure this update is applied consistently across all environments.
fields property is no longer returned. Any deserialization logic or strongly typed models referencing fields will fail or return null values.
v1
{
"fields": [
{
"name": "InvoiceNumber",
"type": "string"
}
]
}
{
"fields": [
{
"name": "InvoiceNumber",
"type": "string"
}
]
}
{
"taxonomy": {
...
}
}
{
"taxonomy": {
...
}
}
- Update your response models to consume the
taxonomyobject. - Refactor downstream logic that previously depended on
fields.
tag property. If you are using strict schema validation, updates are required.
v1
{
"tag": "staging"
}
{
"tag": "staging"
}
{
"tags": ["staging"]
}
{
"tags": ["staging"]
}
Required action:
- Update response models to replace
tagwithtags. - Adjust logic if assuming a single tag value.
IXP extraction result schema changes: tables removed and FieldType.Table replaced by FieldType.FieldGroup
This change affects only IXP extraction results.
In v1, the API returned an IXP extraction result as one or more Tables. This was a mapping of the IXP concept of FieldGroups to tables. All values inside those tables were represented as text (string), regardless of their original IXP data type.
In v2, the API returns IXP extraction results as FieldGroups. This introduces a 1-to-1 mapping with the IXP FieldGroup concept. Each field preserves its actual IXP data type (for example, Text, Number, Date, MonetaryQuantity).
v1 (IXP extraction result returned as Tables; values represented as text)
{
"Tables": [
{
"FieldId": "Seller",
"FieldName": "Seller",
"IsMissing": false,
"DataSource": "Automatic",
"DataVersion": 0,
"OperatorConfirmed": false,
"Values": [
{
"OperatorConfirmed": true,
"Confidence": 0.9999834,
"OcrConfidence": 1.0,
"Cells": [
{
"RowIndex": 0,
"ColumnIndex": 0,
"IsHeader": true,
"IsMissing": false,
"OperatorConfirmed": false,
"DataSource": "Automatic",
"DataVersion": 0,
"Values": [
{
"Components": [],
"Value": "Name",
"UnformattedValue": "Name",
"Reference": {
"TextStartIndex": 0,
"TextLength": 0,
"Tokens": []
},
"DerivedFields": [],
"Confidence": -1.0,
"OperatorConfirmed": false,
"OcrConfidence": 1.0,
"TextType": "Unknown"
}
]
}
],
"ColumnInfo": [
{
"FieldId": "Name",
"FieldName": "Name",
"FieldType": "Text"
}
],
"NumberOfRows": 2
}
]
}
]
}{
"Tables": [
{
"FieldId": "Seller",
"FieldName": "Seller",
"IsMissing": false,
"DataSource": "Automatic",
"DataVersion": 0,
"OperatorConfirmed": false,
"Values": [
{
"OperatorConfirmed": true,
"Confidence": 0.9999834,
"OcrConfidence": 1.0,
"Cells": [
{
"RowIndex": 0,
"ColumnIndex": 0,
"IsHeader": true,
"IsMissing": false,
"OperatorConfirmed": false,
"DataSource": "Automatic",
"DataVersion": 0,
"Values": [
{
"Components": [],
"Value": "Name",
"UnformattedValue": "Name",
"Reference": {
"TextStartIndex": 0,
"TextLength": 0,
"Tokens": []
},
"DerivedFields": [],
"Confidence": -1.0,
"OperatorConfirmed": false,
"OcrConfidence": 1.0,
"TextType": "Unknown"
}
]
}
],
"ColumnInfo": [
{
"FieldId": "Name",
"FieldName": "Name",
"FieldType": "Text"
}
],
"NumberOfRows": 2
}
]
}
]
}{
"Fields": [
{
"FieldId": "Default.Seller",
"FieldName": "Seller",
"FieldType": "FieldGroup",
"IsMissing": false,
"DataSource": "Automatic",
"Values": [
{
"Components": [
{
"FieldId": "Default.Seller.Name",
"FieldName": "Name",
"FieldType": "Text",
"IsMissing": false,
"DataSource": "Automatic",
"Values": [
{
"Components": [],
"Value": "John Doe",
"UnformattedValue": "John Doe",
"Reference": {
"TextStartIndex": 0,
"TextLength": 8,
"Tokens": [
"..."
]
},
"DerivedFields": [],
"Confidence": 0.9999834,
"OperatorConfirmed": false,
"OcrConfidence": 0.90999997,
"TextType": "Text",
"ValidatorNotes": "",
"ValidatorNotesInfo": ""
}
]
}
]
}
]
}
]
}{
"Fields": [
{
"FieldId": "Default.Seller",
"FieldName": "Seller",
"FieldType": "FieldGroup",
"IsMissing": false,
"DataSource": "Automatic",
"Values": [
{
"Components": [
{
"FieldId": "Default.Seller.Name",
"FieldName": "Name",
"FieldType": "Text",
"IsMissing": false,
"DataSource": "Automatic",
"Values": [
{
"Components": [],
"Value": "John Doe",
"UnformattedValue": "John Doe",
"Reference": {
"TextStartIndex": 0,
"TextLength": 8,
"Tokens": [
"..."
]
},
"DerivedFields": [],
"Confidence": 0.9999834,
"OperatorConfirmed": false,
"OcrConfidence": 0.90999997,
"TextType": "Text",
"ValidatorNotes": "",
"ValidatorNotesInfo": ""
}
]
}
]
}
]
}
]
}- In v1, IXP “table-like” results were represented as FieldType.Table in the Fields array and mapped to a tables structure for convenience.
- In v2, IXP results are represented as FieldType.FieldGroup and returned as FieldGroups (1:1 with IXP FieldGroup). Any logic expecting FieldType.Table or tables will break.
必要なアクション
- Update IXP extraction result handling to use FieldGroups instead of Tables.
- If your automation treats IXP extraction results as tables, update parsing logic to handle the new FieldGroup structure and typed fields.
- Replace string-based parsing with type-aware handling. For example:
- Date: parse as a date value
- Number: parse as a numeric value
- MonetaryQuantity: handle value and currency as a single data object
{
"Fields": [
{
"FieldId": "Seller",
"FieldName": "Seller",
"FieldType": "Table",
"IsMissing": false,
"DataSource": "Automatic",
"Values": [
{
"Components": [
{
"FieldId": "Seller.Header",
"FieldName": "Header",
"FieldType": "Internal",
"IsMissing": false,
"DataSource": "Automatic",
"Values": [],
"DataVersion": 0,
"OperatorConfirmed": false,
"ValidatorNotes": ""
},
{
"FieldId": "Seller.Body",
"FieldName": "Body",
"FieldType": "Internal",
"IsMissing": false,
"DataSource": "Automatic",
"Values": [],
"DataVersion": 0,
"OperatorConfirmed": false,
"ValidatorNotes": ""
}
],
"Value": "",
"UnformattedValue": "",
"Reference": {
"TextStartIndex": 0,
"TextLength": 0,
"Tokens": []
},
"DerivedFields": [],
"Confidence": 0.9999834,
"OperatorConfirmed": true,
"OcrConfidence": 1.0,
"TextType": "Unknown"
}
],
"DataVersion": 0,
"OperatorConfirmed": false,
"ValidatorNotes": ""
}
]
}{
"Fields": [
{
"FieldId": "Seller",
"FieldName": "Seller",
"FieldType": "Table",
"IsMissing": false,
"DataSource": "Automatic",
"Values": [
{
"Components": [
{
"FieldId": "Seller.Header",
"FieldName": "Header",
"FieldType": "Internal",
"IsMissing": false,
"DataSource": "Automatic",
"Values": [],
"DataVersion": 0,
"OperatorConfirmed": false,
"ValidatorNotes": ""
},
{
"FieldId": "Seller.Body",
"FieldName": "Body",
"FieldType": "Internal",
"IsMissing": false,
"DataSource": "Automatic",
"Values": [],
"DataVersion": 0,
"OperatorConfirmed": false,
"ValidatorNotes": ""
}
],
"Value": "",
"UnformattedValue": "",
"Reference": {
"TextStartIndex": 0,
"TextLength": 0,
"Tokens": []
},
"DerivedFields": [],
"Confidence": 0.9999834,
"OperatorConfirmed": true,
"OcrConfidence": 1.0,
"TextType": "Unknown"
}
],
"DataVersion": 0,
"OperatorConfirmed": false,
"ValidatorNotes": ""
}
]
}