Activities
latest
false
Banner background image
Document Understanding Activities
Last updated Apr 10, 2024

Extract Document Data

UiPath.IntelligentOCR.StudioWeb.Activities.ExtractDocumentDataWithDocumentData<UiPath.IntelligentOCR.StudioWeb.Activities.DataExtraction.ExtendedExtractionResultForDocumentData>

Extracts data from an input file or Document Data object, and stores the results into a Document Data object (either the one received as input or a newly created one for the input file).

Note:

The Extract Document Data activity requires an activity that precedes it which can provide a Document Data object (produced as output by other Document Understanding activities, for example Classify Document).

The Extract Document Data activity receives as input one of the following choices:
  • Document Data - from the Classify Document activity
  • File - from Get File/Folder or Get Newest Email activities

The supported languages for the generative models are the same as the used OCR engine used. For more information, check the OCR Supported languages page.

Project compatibility: Cross-platform

Properties

  • Project - Requires you to select your Document Understanding project from the drop-down menu. The available options are:
    • Predefined - The default project
    • You can create a custom project by going to Document Understanding.
  • Extractor - Requires you to select the Extractor from the selected project. For the Predefined Project, the available options are:
    • Either one of the ML Packages found here
      Note: The Extract Document Data activity overrides the document type with the selected extractor. This is not applicable for generative models.
    • Generative
  • Prompt - this field appears if you choose the option Generative. Prompt to identify the fields to be extracted, provided as key-value pairs, where the key represents the name of the field and the value a description for it, helping the extractor identify the corresponding value. Click on the field, and you will get a prompt with the following options, provided as pairs:
    • Field name - Requires you to input the field name to be extracted (Ex. Due date) (30-character limit)
    • Generative prompt - Requires you to provide the prompt as input for the Generative Extractor. (500-character limit)
    Tip: For good practices on how to use generative prompts, check the Generative Extractor - Good Practices page.
  • Input - Requires you to specify the file itself, or Document Data, in case you have used other Document Understanding Activities before in your workflow, (for example, Classify Document).

Input

  • Timeout (seconds) (Preview) - Maximum execution time (in seconds) for the call to the generative model. If the operation exceeds this timeout, it is automatically terminated to prevent delays or hangs. This property is only displayed if the Generative Extractor is selected as an extractor.
Output
  • Document Data - All the extracted field data from the file. Information can also be received from Classify Document.
    In case of multi-value fields, all values are returned under Document Data. The values are available in DocumentData.Data.FieldName.MultiValues[]. If the MultiValues value is null, this means that the respective field is not a multi-value field. If the MultiValues property is an array (even if it is empty []), this means the respective field is a multi-value field.
Note: The data sent to the Generative Extractor will be sent to an LLM Model instance which is not publicly available, will not leave it, and once processed, it will not be stored or used for training.
Note: The Extract Document Data activity uses public endpoints.

Was this page helpful?

Get The Help You Need
Learning RPA - Automation Courses
UiPath Community Forum
Uipath Logo White
Trust and Security
© 2005-2024 UiPath. All rights reserved.