
Document Understanding API guide
概述
Document UnderstandingTM APIs can be an alternative to the RPA approach. By initiating an API call, you can:
- 查找有关项目、提取程序或项目中使用的分类器的信息。
- 使用数字化 API。
- Classify documents using specialized (Classifying a document example).
- Extract document data using specialized (Start the extraction fields request example).
- 验证之前经过数字化、分类和/或提取的信息。
根据您的用例,您可以选择使用异步或同步 API。
在下列情况下使用异步 API:
- 您需要处理多页文档。
- 您需要同时处理多个操作。异步 API 允许并发处理并避免空闲时间,从而提高系统吞吐量。这意味着您可以发送文档并继续执行另一个任务,而无需等待响应。
- 您有一个大型数据集需要处理,且需要花费大量时间。
在以下情况下使用同步 API:
- 您仅需要处理单页图像。
- 您需要在请求-响应的基础上进行实时交互,并且不需要多任务处理。在等待响应时,同步 API 可以阻止其他操作。
- 您有一个较小的数据集需要处理。
以下状态适用于异步 API:
- NotStarted: job was created and it's waiting to be processed.
- Running: job was created, was picked up, and is currently being worked on
- Failed: job finished but failed.
- Succeeded: job finished and succeeded.
Classification & Extraction APIs are available for both synchronous and asynchronous consumption. The synchronous consumption supports multi-page documents, up to 5 pages, while the asynchronous consumption posts the request via a start method and retrieves the result via polling.
使用 Document Understanding API 以与通过 RPA 相同的方式使用功能。要使用 API,您可以使用任何编程/脚本语言(因为调用是使用 HTTP 进行的),包括 RPA。
You can access the APIs via Swagger: In the toolbar of the Document UnderstandingTM service, look for the Rest API dropdown, and select Framework.

要试用这些功能,请将 Swagger 用作沙箱。
您可以选择使用预定义构建模型或自定义构建模型。自定义构建的模型是您在使用“Document Understanding”时创建的模型。预定义模型已经可供使用,其中包括预定义的开箱即用模型。
The data received from calling the Digitization endpoint is retained for seven days. In this timeframe, the result is available via the received document ID. Afterwards, you would need to submit a new digitization request.
The data received from calling the asynchronous Classification and Extraction endpoints is retained for one day (24 hours).
根据操作的不同,Document Understanding API 调用使用以下类:
- Document Class for digitized documents.
- Extraction Result for extraction results.
- Classification Result for classification results.