- 概述
- 文档处理合同
- 发行说明
- 关于文档处理合同
- Box 类
- IPersistedActivity 接口
- PrettyBoxConverter 类
- IClassifierActivity 接口
- IClassifierCapabilitiesProvider 接口
- 分类器文档类型类
- 分类器结果类
- 分类器代码活动类
- 分类器原生活动类
- 分类器异步代码活动类
- 分类器文档类型功能类
- ContentValidationData Class
- EvaluatedBusinessRulesForFieldValue Class
- EvaluatedBusinessRuleDetails Class
- 提取程序异步代码活动类
- 提取程序代码活动类
- 提取程序文档类型类
- 提取程序文档类型功能类
- 提取程序字段功能类
- 提取程序原生活动类
- 提取程序结果类
- FieldValue Class
- FieldValueResult Class
- ICapabilitiesProvider 接口
- IExtractorActivity 接口
- 提取程序有效负载类
- 文档操作优先级枚举
- 文档操作数据类
- 文档操作状态枚举
- 文档操作类型枚举
- 文档分类操作数据类
- 文档验证操作数据类
- 用户数据类
- 文档类
- 文档拆分结果类
- DomExtensions 类
- 页类
- 页面分区类
- 多边形类
- 多边形转换器类
- 元数据类
- 词组类
- 词类
- 处理源枚举
- 结果表格单元类
- 结果表值类
- 结果表列信息类
- 结果表类
- 旋转枚举
- Rule Class
- RuleResult Class
- RuleSet Class
- RuleSetResult Class
- 分区类型枚举
- 词组类型枚举
- IDocumentTextProjection 接口
- 分类结果类
- 提取结果类
- 结果文档类
- 结果文档范围类
- 结果数据点类
- 结果值类
- 结果内容引用类
- 结果值令牌类
- 结果派生字段类
- 结果数据源枚举
- 结果常量类
- 简单字段值类
- 表字段值类
- 文档组类
- 文档分类类
- 文档类型类
- 字段类
- 字段类型枚举
- FieldValueDetails Class
- 语言信息类
- 元数据输入类
- 文本类型枚举
- 类型字段类
- ITrackingActivity 接口
- ITrainableActivity 接口
- ITrainableClassifierActivity 接口
- ITrainableExtractorActivity 接口
- 可训练的分类器异步代码活动类
- 可训练的分类器代码活动类
- 可训练的分类器原生活动类
- 可训练的提取程序异步代码活动类
- 可训练的提取程序代码活动类
- 可训练的提取程序原生活动类
- 基本数据点类 - 预览
- 提取结果处理程序类 - 预览
- Document Understanding ML
- Document Understanding OCR 本地服务器
- Document Understanding
- 智能 OCR
- 发行说明
- 关于“智能 OCR”活动包
- 项目兼容性
- 加载分类
- 将文档数字化
- 分类文档作用域
- 基于关键词的分类器
- Document Understanding 项目分类器
- 智能关键词分类器
- 创建文档分类操作
- 创建文档验证工件
- 检索文档验证工件
- 等待文档分类操作然后继续
- 训练分类器范围
- 基于关键词的分类训练器
- 智能关键词分类训练器
- 数据提取作用域
- Document Understanding 项目提取程序
- Document Understanding 项目提取程序训练器
- 基于正则表达式的提取程序
- 表单提取程序
- 智能表单提取程序
- 文档脱敏
- 创建文档验证操作
- 等待文档验证操作然后继续
- 训练提取程序范围
- 导出提取结果
- 机器学习提取程序
- 机器学习提取程序训练器
- 机器学习分类器
- 机器学习分类训练器
- 生成分类器
- 生成式提取程序
- 配置身份验证
- ML 服务
- OCR
- OCR 合同
- OmniPage
- PDF
- [未公开] Abbyy
- [未列出] Abbyy 嵌入式

Document Understanding 活动
数据提取作用域
UiPath.IntelligentOCR.Activities.DataExtraction.DataExtractionScope
描述
为提取程序活动提供作用域,用于根据“分类”中定义的文档类型配置这些活动。活动输出存储在“ExtractionResult”变量中,其中包含所有自动提取的数据,并可作为输入用于“导出提取结果”活动。该活动还具有“配置提取程序”向导,可用于准确指定要从“分类”属性内定义的文档类型中提取那些字段。
项目兼容性
Windows - Legacy | Windows
配置
设计器面板
输入
- “文档路径”- 要验证的文档的路径。该字段仅支持字符串和字符串变量。
备注:
该属性字段支持的文件类型包括“
.png”、“.gif”、“.jpe”、“.jpg”、“.jpeg”、“.tiff”、“.tif”、“.bmp”和“.pdf”。 - DocumentText - The text of the document itself, stored in a String variable. This value can be retrieved from the Digitize Document activity. Visit Digitize Document for more information on how to achieve this. This field supports only strings and
Stringvariables. - DocumentObjectModel - The Document Object Model you want to use to validate the document against. This model is stored in a
Documentvariable and can be retrieved from the Digitize Document activity. Visit Digitize Document for more information on how to achieve this. This field supports onlyDocumentvariables. - Taxonomy - The Taxonomy against which the document is to be processed, stored in a
DocumentTaxonomyvariable. This object can be obtained by using a Load Taxonomy activity. This field supports onlyDocumentTaxonomyvariables. - “分类结果”- 对指定文档运行分类器活动后所得的结果,存储在“
ClassificationResult”对象中。若转而指定“文档类型 ID”,则该字段为可选字段。该字段仅支持“ClassificationResult”变量。 - DocumentTypeID - The Document Type ID, as found in the Taxonomy Manager. This field is optional if you specify a file in the ClassificationResults field. This field supports only strings and
Stringvariables.
输出
- “提取结果”- 数据提取流程所生成的提取结果,存储在“
ExtractionResult”变量中。备注:If the page range for data extraction indicates that only a part of the original file is targeted, the Data Extraction Scope generates a file in the
TEMPproject folder that is then passed to the extractors. The temporary file contains only the page range that extractors should receive for document processing.
属性面板
身份验证
The Authentication properties of this activity allow you to perform auto-validation via on-premises robots. Before configuring these properties, ensure you have fulfilled the prerequisites mentioned in the Configuring Authentication page. Once these steps are completed, you can then proceed to fill in the Authentication properties of the activity.
- Runtime Credentials Asset - Use this field when you need to access Document Understanding auto-validation features while the robot is connected to a local Orchestrator, or from a different tenant. You can choose to enter a Credential Asset, for authentication purposes, in one of the following ways:
-
From the dropdown list, select the desired Credential Asset from the Orchestrator to which the UiPath® Robot is connected to.
-
如果您在 Orchestrator 凭据资产中存储了用于访问自动验证功能的外部应用程序凭据,请手动输入 Orchestrator 凭据资产的路径。
路径的格式应为:
<OrchestratorFolderName>/<AssetName>。
-
- Runtime Tenant Url - Use this field, alongside the Runtime Credentials Asset field. Enter the URL of the tenant that the robot will connect to in order to execute the auto-validation. The URL should be in the following format:
https://<baseURL>/<OrganizationName>/<TenantName>.
常见
- “显示名称”- 活动的显示名称。
输入
- ApplyAutoValidation - Adjust confidence using generative extraction cross-checking. If values are auto-validated, the confidence of those values will be set to the confidence threshold. Enabling this feature has additional AI unit consumption.
- “分类结果”- 对指定文档运行分类器活动后所得的结果,存储在“
ClassificationResult”对象中。若转而指定“文档类型 ID”,则该字段为可选字段。该字段仅支持“ClassificationResult”变量。 - DocumentObjectModel - The Document Object Model you want to use to validate the document against. This model is stored in a
Documentvariable and can be retrieved from the Digitize Document activity. Visit Digitize Document for more information on how to achieve this. This field supports onlyDocumentvariables. - “文档路径”- 要验证的文档的路径。该字段仅支持字符串和字符串变量。
备注:
该属性字段支持的文件类型包括“
.png”、“.gif”、“.jpe”、“.jpg”、“.jpeg”、“.tiff”、“.tif”、“.bmp”和“.pdf”。 - DocumentText - The text of the document itself, stored in a String variable. This value can be retrieved from the Digitize Document activity. Visit Digitize Document for more information on how to achieve this. This field supports only strings and
Stringvariables. - DocumentTypeID - The Document Type ID, as found in the Taxonomy Manager. This field is optional if you specify a file in the ClassificationResults field. This field supports only strings and
Stringvariables. - FormatValuesIfPossible - Specifies that if a value has derived parts reported, then it isn't overridden by the data extraction scope, but if it doesn't have derived parts, then the data extraction scope tries to compute it. If the option is set to False then the values are not formatted.
- AutoValidationConfidenceThreshold - Confidence threshold for generative validation. Only field values with confidence below this threshold will be validated. If values are confirmed, the confidence of those values will be set to this threshold.
- Taxonomy - The Taxonomy against which the document is to be processed, stored in a
DocumentTaxonomyvariable. This object can be obtained by using a Load Taxonomy activity. This field supports onlyDocumentTaxonomyvariables.
其他
- “私有”- 选中后将不再以“Verbose”级别记录变量和参数的值。
输出
- “提取结果”- 数据提取流程所生成的提取结果,存储在“
ExtractionResult”变量中。备注:If the page range for data extraction indicates that only a part of the original file is targeted, the Data Extraction Scope generates a file in the
TEMPproject folder that is then passed to the extractors. The temporary file contains only the page range that extractors should receive for document processing.
使用“配置提取程序向导”
The Configure Extractors Wizard can be accessed via the Data Extraction Scope and allows you to choose which extractors are applied to each document type and field.
From the body of the activity, select Configure Extractors. The wizard button becomes available after dragging at least one extractor activity into the body of the Data Extraction Scope activity. This wizard displays all the document types defined in the taxonomy and their respective fields, and enables you to choose which extractor you want to use for each.
Figure 1. Overview of the Configure Extractors wizard

您可在该向导中展开每个文档类型,查看其字段并选择要提取的相应字段。
Figure 2. The selection of an extractor for a document type in the Configure Extractors wizard

The Framework Alias field can be used to map an extractor to one or more trainers. For instance, you can give a Machine Learning Extractor the alias R2D2 and then you can use the same alias for a Machine Learning Extractor Trainer. This creates a link between the extractor and the trainer and has training purposes for the extractor. Each extractor has a unique alias while multiple trainers can share the same alias.
You can configure the Minimum Confidence field to allow a confidence threshold between 0 and 100. The predicted value for a field is considered only if the prediction's confidence score is equal or higher than the configured Minimum confidence. If a prediction's confidence score is less than the Minimum confidence threshold, the predicted value is not stored in the output of the Data Extraction Scope activity.
You can identify an optimal confidence level by testing various documents within your workflow, recording the results in an Excel spreadsheet, for example, and then analyze what threshold value is the most accurate.
Select Get of refresh extractor capabilities, for the extractors that support this functionality, to easily map your taxonomy fields with the available extractor fields or refresh them in case the extractor fields have changed.
The check boxes next to each field in any column, if selected, cause the Data Extractor Scope to request that particular field from the extractor. If the check box is unchecked, Data Extractor Scope does not request a value for that field from the extractor.
若使用每个字段旁的文本输入,您便可将“分类”中定义的字段映射到提取程序内部分类中定义的字段(如有)。对于常规字段,请在文本输入中添加提取程序内部分类中目标字段的标识符。对于表格字段,在表格级别映射父表字段,并单独映射相应的列。
When using the Machine Learning Extractor in a setup with defined Column Fields, these can be mapped to a table field from your Taxonomy. They will be displayed under a collection called items.
向导中的列数会因作用域活动中显示的提取程序数量而异。每列的名称由每个提取程序活动的显示名称指定。
Figure 3. Multiple extractors present in the Configure Extractors wizard

If multiple extractors are used in the activity, the order of the extractors in the scope defines their priority. For example, let's consider three extractors. Extractor 1 returns an acceptable value (which is above the Minimum Confidence level) for a particular requested field, then that field is not requested when Extractor 2 and Extractor 3 are executed. If Extractor 1 and Extractor 2 return values below the Minimum Confidence level for that particular field, or return nothing at all, the results from Extractor 3 are taken into account, if they satisfy the confidence acceptability conditions.
Document Understanding 集成
The Data Extraction Scope activity is part of the Document Understanding solutions. Visit the Document Understanding Guide for more information.