Subscribe

UiPath Activities

The UiPath Activities Guide

Data Extraction Scope

UiPath.IntelligentOCR.Activities.DataExtraction.DataExtractionScope

Provides a scope for extractor activities, enabling you to configure them according to the document types defined in your taxonomy. The output of the activity is stored in an ExtractionResult variable, containing all automatically extracted data, and can be used as input for the Export Extraction Results activity. This activity also features a Configure Extractors wizard, which lets you specify exactly what fields from the document types defined in the taxonomy you want to extract.

Properties

Common

  • DisplayName - The display name of the activity.

Input

  • ClassificationResults - The results of running a classifier activity on the specified document, stored in a ClassificationResult object. This field is optional if you specify a DocumentTypeId instead. This field supports only ClassificationResult variables.
  • DocumentObjectModel - The Document Object Model you want to use to validate the document against. This model is stored in a Document variable and can be retrieved from the Digitize Document activity. Please see the documentation of the activity for more information on how to do this. This field supports only Document variables.
  • DocumentPath - The path to the document you want to validate. This field supports only strings and String variables.

📘

Note:

The supported file types for this property field are .png, .gif, .jpe, .jpg, .jpeg, .tiff, .tif, .bmp, and .pdf.

  • DocumentText - The text of the document itself, stored in a String variable. This value can be retrieved from the Digitize Document activity. Please see the documentation of the activity for more information on how to do this. This field supports only strings and String variables.
  • DocumentTypeID - The Document Type ID, as found in the Taxonomy Manager. This field is optional if you specify a file in the ClassificationResults field. This field supports only strings and String variables.
  • FormatValuesIfPossible - Specifies that if a value has derived parts reported, then it isn't overridden by the data extraction scope, but if it doesn't have derived parts,
    then the data extraction scope tries to compute it. If the option is set to False then the values are not formatted.
  • Taxonomy - The Taxonomy against which the document is to be processed, stored in a DocumentTaxonomy variable. This object can be obtained by using a Load Taxonomy activity. This field supports only DocumentTaxonomy variables.

Misc

  • Private - If selected, the values of variables and arguments are no longer logged at Verbose level.

Output

  • ExtractionResults - The extraction results of the data extraction process, stored in an ExtractionResult variable.

📘

Note:

If the page range for data extraction indicates that only a part of the original file is targeted, the Data Extraction Scope generates a file in the TEMP project folder that is then passed to the extractors. The temporary file contains only the page range that extractors should receive for document processing.

Learn More

To learn more about Data Extraction Scope, please visit the Document Understanding Guide here.

Updated 7 months ago


Data Extraction Scope


Suggested Edits are limited on API Reference Pages

You can only suggest edits to Markdown body content, but not to the API spec.