UiPath Documentation
activities
latest
false
重要 :
请注意,此内容已使用机器翻译进行了部分本地化。 新发布内容的本地化可能需要 1-2 周的时间才能完成。
UiPath logo, featuring letters U and I in white

Document Understanding 活动

上次更新日期 2026年4月22日

使用智能表单提取程序提取基于锚点的数据

以下示例说明了如何从可能还包含手写文本的表单中提取数据。 以下用例场景说明了如何从采购订单中提取数据。

It presents activities such as Digitize Document, Data Extraction Scope, or Intelligent Form Extractor. You can find these activities in the UiPath.IntelligentOCR.Activities package.

创建工作流

在创建以下工作流之前,需要安装以下包:

  • UiPath.DocumentProcessing.Contracts.Activities
  • UiPath.Intelligent OCR.Activities
  • UiPath.OCR.Activities
  • UiPath.OCR.Contracts
  • UiPath.WebAPI.Activities

Steps:

  1. 打开 Studio ,创建一个新“流程”

  2. Add a Sequence container in the Workflow Designer, name it Sequence1, and create the variables shown in the following table:

    Table 1. Variables to be created

    变量类型默认值
    item字符串不适用
    classificationResultClassificationResult[]不适用
    outputFileName泛型值不适用
  3. Add another Sequence container in the Workflow Designer, after the first one, name it Sequence2, and create the variables shown in the following table:

    Table 2. Variables to be created

    变量类型默认值
    text字符串不适用
    taxonomyDocumentTaxonomy不适用
    dom文档不适用
    documentPath字符串不适用
    classificationResult2ClassificationResult[]不适用
    outputFileName2泛型值不适用
  4. Add a Message Box activity inside the sequence.

    • In the Properties panel, select the Ok option from the Buttons dropdown. Add the following message in the Text field: "Select a PDF file".
  5. 选中“排名最前”选项的复选框。这会将消息框置于前台。

  6. Add a Select File activity after the Message Box activity.

    • In the Properties panel, add the following text in the Filter field: Pdf files (*.pdf)|*.pdf
    • Add the documentPath variable in the SelectedFile field.
  7. Add an Assign activity after the Select File activity.

    • Add the outputFileName2 variable in the To field.
    • 在“值”字段添加表达式 ".temp/" + Path.GetFileName(documentPath)
  8. Add a Deserialize JSON activity after the Assign activity.

    • Add the expression File.ReadAllText("DocumentProcessing axonomy.json") in the JSON String field.
    • In the Properties panel, select the UiPath.DocumentProcessing.Contracts.Taxonomy.DocumentTaxonomy option from the TypeArgument dropdown list.
    • Add the taxonomy variable in the JsonObject field.
  9. Add a Digitize Document activity after the Deserialize JSON activity.

    • In the Properties panel, add the value 1 in the DegreeOfParallelism field.
    • Add the documentPath variable in the DocumentPath field.
    • Add the dom variable in the DocumentObjectModel field.
    • Add the text variable in the DocumentText field.
    • Add the UiPath® Document OCR engine inside the activity.
    • Add your API Key inside the ApiKey field.
    • Add the "https://du.uipath.com/ocr" expression in the Endpoint field.
  10. Add a Write Text File activity after the Digitize Document activity.

    • Add the JsonConvert.SerializeObject(dom) expression in the Text field.
    • Add the outputFileName2 + ".dom.json" expression in the FileName field.
  11. Add another Write Text File activity after the Write Text File activity.

    • Add the text variable in the Text field.
    • Add the outputFileName2 + ".text.txt" expression in the FileName field.
  12. Drag another Sequence container in the Workflow Designer, name it Sequence3, and create the variables shown in the following table:

    Table 3. Variables to be created

    变量类型默认值
    extractionResult提取结果不适用
    validatedResults提取结果不适用
    doubleValidatedResults提取结果不适用
    dataset数据集不适用
    iInt32不适用
  13. Add a Data Extraction Scope activity inside the Sequence3.

    • In the Properties panel, add the dom variable in the DocumentObjectModel field.
    • Add the documentPath variable in the DocumentPath field.
    • Add the text variable in the DocumentText field.
    • Add the "All.Benchmarks.Invoice" expression in the DocumentTypeId field.
    • Add the taxonomy variable in the Taxonomy field.
    • Add the extractionResult variable in the ExtractionResults field.
  14. Add an Intelligent Form Extractor activity inside the Data Extraction Scope activity.

    • Add your API Key in the ApiKey field.
  15. Add a Write Text File activity after the Data Extraction Scope activity.

    • Add the JsonConvert.SerializeObject(extractionResult) expression in the Text field.
    • Add the outputFileName2 + ".results.json" expression in the FileName field.
  16. Add a Present Validation Station activity after the Write Text File activity.

    • Add the extractionResult variable in the AutomaticExtractionResults field.
    • Add the dom variable in the DocumentObjectModel field.
    • Add the documentPath variable in the DocumentPath field.
    • Add the text variable in the DocumentText field.
    • Add the taxonomy variable in the Taxonomy field.
    • Add the validatedResults variable in the ValidatedExtractionResults field.
  17. Add a Write Text File activity after the Present Validation Station activity.

    • Add the JsonConvert.SerializeObject(validatedResults) expression in the Text field.
    • Add the outputFileName2 + ".savedinVS.results.json" expression in the FileName field.
  18. Add another Write Text File activity after the Write Text File activity.

    • Add the JsonConvert.SerializeObject(doubleValidatedResults) expression in the Text field.
    • Add the outputFileName2 + ".doubleSavedinVS.results.json" expression in the FileName field.
  19. 运行流程。 自动化流程应打开验证站点,提取数据并进行验证,并将其存储在“输出”文件夹中。

Visit the following link to download the example in a ZIP format: Example.

定义分类

You have created your workflow, defined all variables, and customized all activities. Now it's time to define your taxonomy. Visit Load Taxonomy to learn about defining your own taxonomy.

Create your taxonomy to be able to extract information from an invoice. You should be focused on creating an Invoice document type, with the fields shown in the following table:

Table 4. Invoice document type fields

字段类型
发票编号Text
小计Number
销售税Number
总计Number

Figure 1. Overview of the finished taxonomy with the previously mentioned fields

包含前面提到的字段的已完成分类的概览

创建模板

It is now time to create the template for the extraction process. Visit Load Taxonomy to learn how to create a template.

对于本示例,使用以下值配置模板:

  • Document Type: Invoice.
  • Template Name: Invoice-example.
  • Template Document: Select the target file.
  • OCR Engine: Microsoft OCR.
  • Languages: en.
  • Profile: Scan.
  • Scale: 1.

Figure 2. Animated image example showing the configuration of the template

显示模板配置的动图示例

在模板中设置锚点

当您需要从文档中提取精确信息时,锚点是一项非常特殊且有用的功能。 通过使用锚点定义提取区域,您可以期望数据提取具有较高的准确性。

定义分类并创建模板后,您可以使用锚点开始配置模板,这意味着在框中定义提取区域,并且使用锚点定义框位置。

在开始将锚点添加到模板之前,请查看以下列表,以获取一些建议:

  • 锚框应尽可能大(高度、宽度),以覆盖任何类型的发票编号(长、短、大字体等)。
  • 一个提取区域可以根据需要具有任意数量的锚点,但只能将一个锚点定义为主锚点(第一个)。
  • 使用由多个并排单词组成的锚点。
  • 主锚点应尽可能靠近提取区域。
  • 提取区域和主锚点的位置在模板中是固定的,即使应用于不同的文档也是如此。 唯一可以改变的是主锚点和次要锚点之间的距离。

让我们继续配置模板,看看如何使用锚点提取数据。

  1. 设置提取区域:
    • In the right area of the Validation Station, select Selection modes.

    • Select Anchor.

    • 开始选择所需的区域。

      备注:

      主锚点应包含两个或三个单词,以便在提取过程中获得更高的准确性和更好的结果。

      标记锚点时,通过按 CTRL 键并选择所需字词来选择多个字词。

  2. 设置主锚点:
    1. While still in the Anchor selection mode, select the desired area as your main anchor.
    2. Select Extract value for the desired field.
  3. 设置次要锚点:
    1. Ensure you're still in the Anchor selection mode, and with the main anchor selections activated.
    2. 为次要锚点选择新区域。
    3. Select Options for the desired field, and then select Change extracted value.

重复此过程,直到完成所有提取区域的定义并添加所有锚点。 完成后,保存模板。

  • 创建工作流
  • 定义分类
  • 创建模板
  • 在模板中设置锚点

此页面有帮助吗?

连接

需要帮助? 支持

想要了解详细内容? UiPath Academy

有问题? UiPath 论坛

保持更新