UiPath Documentation
document-understanding
2.2510
true
重要 :
新发布内容的本地化可能需要 1-2 周的时间才能完成。
UiPath logo, featuring letters U and I in white

Document Understanding 用户指南

上次更新日期 2026年4月15日

简介

The UiPath® Document UnderstandingTM framework facilitates the processing of incoming files, from file digitization to extracted data validation, all in an open, extensible, and versatile environment.

Document Understanding 旨在帮助您结合使用不同的方法,从多种文档类型中提取信息。其主要目的是尽量简化数据提取过程:创建一个可从各种文档中提取数据的工作流。

在使用 Document Understanding 框架之前,建议您先了解以下 Document Understanding 框架组件

  • 分类 What documents need to be processed and what data is required from them? Used to define the document types and the pieces of information targeted for data extraction (fields) for each document type, and formalizes this information into a dedicated Taxonomy structure. This metadata information is managed through the Taxonomy Manager.
  • 数字化 What does this file contain? Used to obtain the textual content and the structure of the incoming document, turning a file into machine-readable content so it can be further processed downstream.
  • 文档分类 What types of documents from the taxonomy are found in this file? Used to automatically determine what document types are found within a digitized file.
  • 文档分类验证 Is the predicted classification correct? This is how I can review and correct it. Used for assisting in the human validation and correction of the automatic classification and document splitting results.
  • 分类训练 Did the human review the data? This is how the robot can learn from it. Used to pass the human validated information back to the classifiers, to use it to improve their future predictions.
  • 数据提取 What data can be found in this particular document? Used to capture the information required for the identified document type, within the given input document and classification page range.
  • 数据提取验证 Is the extracted information correct? This is how I can review and correct it. Used for assisting in the human validation and correction of the automatically extracted data results.
  • 数据提取训练 Did the human review the data? This is how the robot can learn from it. Used to pass the human validated extracted data back to the extractors, to use it to improve their extraction predictions.
  • Data Consumption Used to export the validated data in order to consume it.
  • Metering & Charging Logic Used to explain the consumption of units per page for each available service.

下图显示 Document Understanding 框架的各个组件以及它们之间的相互关系:

Document Understanding 框架示意图

The Document Understanding framework is found in the UiPath.IntelligentOCR.Activities package. Once the UiPath.IntelligentOCR.Activities package is installed, the Taxonomy Manager wizard appears in the top ribbon of the UiPath Studio. This same package contains all the core document understanding framework activities.

The scope activities (Classify Document Scope, Data Extraction Scope, Train Classifiers Scope, Train Extractors Scope) that are part of the Document Understanding framework allow you to use any document classification and data extraction algorithms that fit your use case and then train these algorithms.

The Document Understanding framework can be used not only with the out-of-the-box classifiers and extractors but also with any custom-built ones. These can be created using the abstract classes from the UiPath.DocumentProcessing.Contracts package and can be implemented as classification or data extraction activities. Custom-built OCR engines can also be created using the abstract classes from the UiPath.OCR.Contracts package.

资源

Dedicated Document Understanding courses can be found in the UiPath RPA Academy.

UiPath Community 论坛是通过我们不断发展的用户社区获得支持的地方。

  • 资源

此页面有帮助吗?

连接

需要帮助? 支持

想要了解详细内容? UiPath Academy

有问题? UiPath 论坛

保持更新