- 概述
- 文档处理合同
- 发行说明
- 关于文档处理合同
- Box 类
- IPersistedActivity 接口
- PrettyBoxConverter 类
- IClassifierActivity 接口
- IClassifierCapabilitiesProvider 接口
- 分类器文档类型类
- 分类器结果类
- 分类器代码活动类
- 分类器原生活动类
- 分类器异步代码活动类
- 分类器文档类型功能类
- ContentValidationData Class
- EvaluatedBusinessRulesForFieldValue Class
- EvaluatedBusinessRuleDetails Class
- 提取程序异步代码活动类
- 提取程序代码活动类
- 提取程序文档类型类
- 提取程序文档类型功能类
- 提取程序字段功能类
- 提取程序原生活动类
- 提取程序结果类
- FieldValue Class
- FieldValueResult Class
- ICapabilitiesProvider 接口
- IExtractorActivity 接口
- 提取程序有效负载类
- 文档操作优先级枚举
- 文档操作数据类
- 文档操作状态枚举
- 文档操作类型枚举
- 文档分类操作数据类
- 文档验证操作数据类
- 用户数据类
- 文档类
- 文档拆分结果类
- DomExtensions 类
- 页类
- 页面分区类
- 多边形类
- 多边形转换器类
- 元数据类
- 词组类
- 词类
- 处理源枚举
- 结果表格单元类
- 结果表值类
- 结果表列信息类
- 结果表类
- 旋转枚举
- Rule Class
- RuleResult Class
- RuleSet Class
- RuleSetResult Class
- 分区类型枚举
- 词组类型枚举
- IDocumentTextProjection 接口
- 分类结果类
- 提取结果类
- 结果文档类
- 结果文档范围类
- 结果数据点类
- 结果值类
- 结果内容引用类
- 结果值令牌类
- 结果派生字段类
- 结果数据源枚举
- 结果常量类
- 简单字段值类
- 表字段值类
- 文档组类
- 文档分类类
- 文档类型类
- 字段类
- 字段类型枚举
- FieldValueDetails Class
- 语言信息类
- 元数据输入类
- 文本类型枚举
- 类型字段类
- ITrackingActivity 接口
- ITrainableActivity 接口
- ITrainableClassifierActivity 接口
- ITrainableExtractorActivity 接口
- 可训练的分类器异步代码活动类
- 可训练的分类器代码活动类
- 可训练的分类器原生活动类
- 可训练的提取程序异步代码活动类
- 可训练的提取程序代码活动类
- 可训练的提取程序原生活动类
- 基本数据点类 - 预览
- 提取结果处理程序类 - 预览
- Document Understanding ML
- Document Understanding OCR 本地服务器
- Document Understanding
- 智能 OCR
- 发行说明
- 关于“智能 OCR”活动包
- 项目兼容性
- 加载分类
- 将文档数字化
- 分类文档作用域
- 基于关键词的分类器
- Document Understanding 项目分类器
- 智能关键词分类器
- 创建文档分类操作
- 创建文档验证工件
- 检索文档验证工件
- 等待文档分类操作然后继续
- 训练分类器范围
- 基于关键词的分类训练器
- 智能关键词分类训练器
- 数据提取作用域
- Document Understanding 项目提取程序
- Document Understanding 项目提取程序训练器
- 基于正则表达式的提取程序
- 表单提取程序
- 智能表单提取程序
- 文档脱敏
- 创建文档验证操作
- 等待文档验证操作然后继续
- 训练提取程序范围
- 导出提取结果
- 机器学习提取程序
- 机器学习提取程序训练器
- 机器学习分类器
- 机器学习分类训练器
- 生成分类器
- 生成式提取程序
- 配置身份验证
- ML 服务
- OCR
- OCR 合同
- OmniPage
- PDF
- [未公开] Abbyy
- [未列出] Abbyy 嵌入式

Document Understanding 活动
机器学习提取程序训练器
UiPath.DocumentUnderstanding.ML.Activities.MachineLearningExtractorTrainer
描述
Enables the collection of data that has been processed through Validation Station so that it can be imported into Document Manager. This activity can be used only within the Train Extractors Scope activity.
项目兼容性
Windows - Legacy | Windows
配置
设计器面板
本地存储
- Output Folder - The directory where the collected data is stored. Once the data is stored, it can be imported into machine learning training tools.
选择项目的私有数据集
- Dataset - The dataset where the training data can be uploaded. If the robot is connected to a tenant which has AI Center enabled, you can see all the datasets from AI Center in the dropdown menu and select the folder where to upload the validated documents using the dropdown menu.
- Project - The project where the training data can be uploaded.
备注:
Project and dataset selection are enabled only when connected to Orchestrator. Visit Managing datasets for more information about Public/Private Datasets.
提供公共数据集端点
- Dataset ApiKey - The authentication key of the dataset.
- Dataset Endpoint - The endpoint of the dataset where training data can be uploaded. Once a dataset is public, it can be accessed outside UiPath® environment through an endpoint and using API key. Do this if you want to upload datasets to an AI Center instance that you're not connected to (for example in the case of hybrid deployments where the AI Center is on Cloud and the robot is connected to an On premises tenant).
属性面板
常见
- “显示名称”- 活动的显示名称。
本地存储
- Output Folder - The directory where the collected data is stored. Once the data is stored, it can be imported into machine learning training tools.
其他
- “私有”- 选中后将不再以“Verbose”级别记录变量和参数的值。
提供公共数据集端点
- Dataset ApiKey - The authentication key of the dataset.
- Dataset Endpoint - The endpoint of the dataset where training data can be uploaded. Once a dataset is public, it can be accessed outside UiPath® environment through an endpoint and using API key. Do this if you want to upload datasets to an AI Center instance that you're not connected to (for example in the case of hybrid deployments where the AI Center is on Cloud and the robot is connected to an On premises tenant).
选择项目的私有数据集
- Dataset - The dataset where the training data can be uploaded. If the robot is connected to a tenant which has AI Center enabled, you can see all the datasets from AI Center in the dropdown menu and select the folder where to upload the validated documents using the dropdown menu.
- Project - The project where the training data can be uploaded.
备注:
Project and dataset selection are enabled only when connected to Orchestrator. Visit Managing datasets for more information about Public/Private Datasets.
服务器
- RetryOnFailure - Retry on transient failure. This field only supports Boolean values (True, False). The default value is True.
- Timeout (milliseconds) - Specifies the amount of time (in milliseconds) to wait for a response from the server before an error is thrown. The default value is 100000 milliseconds (100 seconds).
使用机器学习提取程序训练向导
The Machine Learning Extractor Trainer collects the human feedback for you, in a directory of your choice. Once you collect data and you want to retrain an ML Model, you can just zip the content of the directory and upload it in Document Manager for gathering and filtering data.
如何使用
To use the Machine Learning Extractor Trainer activity, perform the following steps:
-
使用“分类管理器”向导定义文档类型和字段。
-
Add a Machine Learning Extractor Trainer into a Train Extractors Scope activity.
-
In the Machine Learning Extractor wizard that automatically opens, enter information for the Endpoint field. You can choose one of the public endpoints. Visit Public endpoints for more information about public endpoints.
-
Select the check box for the Update activity arguments if you wish to also use the entered values as input arguments for the activity, more precisely for the Endpoint.
-
Select Get Capabilities. The wizard closes after this operation
-
输入输出文件夹的值。
-
Select the Configure Extractors option in the Train Extractors Scope. A wizard is displayed.
Figure 1. The Configure Extractors wizard

-
The Machine Learning Extractor Trainer is now ready for configuration. Expand the document type that you want to apply it for, and start selecting the fields you want to train, by selecting the checkboxes next to the appropriate fields.
-
Fill in the text boxes either manually or by selecting, from the available dropdown list, the correct data you wish to map to each field. The dropdown list contains all fields that the Machine Learning Extractor Trainer, using the endpoint entered in the Machine Learning Extractor wizard, declares as extraction capability.
备注:If you select the check box but you leave the text box empty, the latter will be automatically filled in with the Document Type ID from the local taxonomy. The changes apply after saving. Should you want to avoid using a long string for the field ID, we would recommend you to manually enter a value in case you do not have access to the internal taxonomy of the extractor.
-
To check if you are using the latest capabilities of the extractor, you can select the Get or refresh extractor capabilities which opens the Machine Learning Extractor wizard.
-
从下拉列表中选择一个选项会自动确认该字段。
-
要根据提取结果训练提取程序,您可以在先前用于提取程序的“框架别名”字段中设置确切的字母数字值。
-
Select Save once all fields are configured properly.
重要提示:您不能为两个不同的字段选择相同的选项。
Document Understanding 集成
The Machine Learning Extractor Trainer activity is part of the Document Understanding solutions. Visit the Document Understanding Guide for more information.