UiPath Documentation
document-understanding
2.2510
true
重要 :
新发布内容的本地化可能需要 1-2 周的时间才能完成。
UiPath logo, featuring letters U and I in white

Document Understanding 用户指南

上次更新日期 2026年4月15日

使用附加字段重新训练发票

重要提示:

The aim of this page is to help first time users get familiar with Document UnderstandingTM.

For scalable production deployments, we strongly recommend using the Document Understanding Process available in UiPath® Studio under the Templates section.

本快速入门向您展示如何重新训练开箱即用的发票 ML 模型,以再提取一个字段。

Let’s use the same workflow we used for the receipts in the previous quickstart and modify it so it can support invoices.

为此,我们需要在工作流中执行以下步骤:

  1. 修改分类
  2. 添加分类器
  3. 添加机器学习提取程序
  4. 标记数据
  5. 重新训练发票 ML 模型

1. 修改分类

在此步骤中,我们需要修改分类以添加发票文档类型。

To do so, open Taxonomy Manager and create group named Semi Structured Documents, a category named Finance, a document type named Invoices. Create the listed fields with user friendly names along with respective data types.

  • 名称 - Text
  • 供应商地址 - Address
  • 账单名称 - Text
  • 账单地址 - Address
  • 收货地址 - Address
  • 发票编号 - Text
  • 订单号 - Text
  • 供应商增值税号 - Text
  • 日期 - Date
  • 税金 - Number
  • 总计 - Number
  • 付款条款 - Text
  • 净额 - Number
  • 到期日 - Date
  • 折扣 - Number
  • 运费 - Number
  • 付款地址 - Address
  • 说明 - Text
  • 项目 - Table
    • 说明 - Text
    • 数量 - Number
    • 单价 - Number
    • 行数量 - Number
    • 项目订单号 - Text
    • 行号 - Text
    • 部分编号 - Text
    • 账单增值税号 - Text

2. 添加分类器

在此步骤中,我们需要添加分类器,以便使用工作流处理收据和发票。

Since our workflow now supports two document types, Receipts and Invoices, we need to add the classifier to differentiate between different document types coming in as input:

  1. Add a Classify Document Scope after the Digitize Document activity and provide the DocumentPath, DocumentText, DocumentObjectModel, and Taxonomy as input arguments and capture the ClassificationResults in a new variable. We need this variable to check what document(s) we are processing.
  2. We also need to specify one or more classifiers. In this example, we are using the Intelligent Keyword Classifier. Add it to the Classify Document Scope activity. This page helps you take an educated decision on what classification method you should use in different scenarios.
  3. Train the classifier as described here.
  4. 通过为这两种文档类型启用分类器来配置分类器。
  5. Depending on your usecase, you might want to validate the classification. You can do that using the Present Classification Station or the Create Document Classification Action and Wait For Document Classification Action And Resume activities.

3. 添加机器学习提取程序

In this step, we need to add a Machine Learning Extractor to the Data Extraction Scope activity and connect it to the Invoices public endpoint.

该过程与我们之前添加的先前收据机器学习提取程序完全相同:

  1. 在收据机器学习提取程序旁边添加“机器学习提取程序”活动。

  2. Provide the Invoices public endpoint, namely https://du.uipath.com/ie/invoices, and an API key to the extractor.

  3. 通过将在分类管理器中创建的字段映射到 ML 模型中的可用字段,配置提取程序以使用发票:

    “配置提取程序”对话框的屏幕截图。

  4. Do not forget to use the ClassificationResults variable outputted by the Classify Document Scope as input to the Data Extraction Scope, instead of specifying a DocumentTypeId. You should end up with something like this:

    “数据提取作用域”对话框的屏幕截图。

  5. 运行工作流以测试它是否能正确处理发票。

4. 标记数据

在重新训练基本发票 ML 模型之前,我们需要为数据加上标签,以使其支持新的 IBAN 字段。

  1. Collect the requirements and sample invoice documents in sufficient volume for the complexity of the usecase you need to solve. Label 50 pages, as explained on this documentation page.
  2. Gain access to an instance of Document Manager either on premises or in AI Center in the Cloud. Make sure you have the permissions to use Document Manager.
  3. 创建一个 AI Center 项目,转到“数据标签”>“UiPath Document Understanding”,然后创建一个“数据标签”会话。
  4. Configure an OCR Engine as described here, try importing a diverse set of your production documents and make sure that the OCR engine reads the text you need to extract. More suggestions in this section. Only proceed to next step after you have settled on a OCR engine.
  5. Create a fresh Document Manager session, and import a Training set and an Evaluation set, while making sure to check the Make this a Test set checkbox when importing the Evaluation set. More details about imports here.
  6. Create and configure the IBAN field as described here. More advanced guidelines are available in this section.
  7. Label a Training dataset and an Evaluation dataset as described here. The prelabeling feature of Document Manager described here can make the labeling work a lot easier.
  8. Export first the Evaluation set and then the Training set to AI Center by selecting them from the filter dropdown at the top of the Document Manager view. More details about exports here.

接下来,让我们创建模型,重新训练并部署它。

5. 重新训练发票 ML 模型

现在,我们的工作流支持处理发票,我们需要从发票中提取 IBAN,此字段是开箱即用的发票 ML 模型默认情况下不会选取的字段。这意味着我们需要从基本模型开始重新训练新模型。

  1. Create an ML Package as described here. If your document type is different from the ones available out-of-the-box, then choose the DocumentUnderstanding ML Package. Otherwise, use the package closest to the document type you need to extract.
  2. Create a Training Pipeline as described here using the Input dataset which you exported in the previous section from Document Manager.
  3. When the training is done and you have package minor version 1, run an Evaluation Pipeline on this minor version and inspect the evaluation.xlsx side by side comparison. Use the detailed guidelines here.
  4. If the evaluation results are satisfactory, go to the ML Skills view and create an ML Skill using the new minor version of the ML Package. If you want to use this to do prelabeling in Document Manager, you need to select the Modify Current Deployment button at the top right of the ML Skill view and toggle on the Make ML Skill Public.
  5. After creating the ML skill, we now need to consume it in Studio. The easiest way to do that is to make the ML Skill public as described here. Then, the only thing left to do is simply replace the Invoices ML model public endpoint that we’ve initially added to the Machine Learning Extractor in our workflow with the public endpoint of the ML Skill.
  6. 运行工作流,您应该会看到按默认发票字段顺序提取新添加的 IBAN 字段。

下载示例

Download this sample project using this link. You need to change the Machine Learning Extractor for Invoices from Endpoint mode to your trained ML Skill.

此页面有帮助吗?

连接

需要帮助? 支持

想要了解详细内容? UiPath Academy

有问题? UiPath 论坛

保持更新