Activities
latest
false
Banner background image
Document Understanding Activities
Last updated Apr 29, 2024

Document Data

Document Data is a resource that serves both as an input and output variable, within your Document Understanding workflows. The Document Data object holds all the necessary information about a single document. If you classify a document, the object includes the Document Type. If you extract data, the object contains the corresponding extracted fields. Irrespective of the activity, Document Data consistently contains the document's text and DOM (Document Object Model).

With Document Data you can: collect all the necessary information about a document in one variable, save data to each property of the object, and reuse it for other activities in the workflow.

Document Data holds information about the following attributes:

  • DocumentType: Provides data about the identified Document type, populated by activities such as Classify Document or Create Classification Validation Task
  • Data: Contains the extracted field values, populated by activities such as Extract Document Data, or Create Document Validation Task
  • FileDetails: Contains details about the IResource.
  • SubDocuments: Includes a collection of Document Data, populated by activities such as Create Classification Validation Task.
  • DocumentMetadata: Contains information about processing the document, such as:
    • Text detected language
    • Extracted fields as Data Table
    • Document Object Model (DOM): Holds the Document Object Model which is used by all activities.
    The DocumentMetadata is updated by the activity that first processes the document. Once populated, the metadata is shared with and used by all the succeeding activities that receive the Document Data object.
    Tip: Unless an activity is the first Document Understanding activity part of a Studio workflow, use Document Data as input. Use the File variable as input only if the activity is the first Document Understanding one part of a Studio workflow.

Properties

The properties of the Document Data variable can be populated and consumed by one or multiple activities. Depending on the activity populating the variable, the properties can differ.

Attribute namePropertyDescriptionActivities populating the value
Document TypeDisplayName (used for custom models)Name of the Document TypeClassify Document
ID (used for out-of-the-box models)Name of the Document Type
ConfidenceClassification confidence
URLURL of where the Document Type is accessible; this can be either custom or predefined, referenced via the respective project in Document Understanding center.
FieldsField ValueExtraction value of the field
Extraction Confidence ScoreConfidence score of the extraction, as provided by the model
OCR Confidence ScoreConfidence score provided by the OCR engine
File DetailsFullNameFull name of the fileActivities creating the Document Data object, receiving a file as input
ExtensionExtension of the file
Page RangePage range of the file
Sub-DocumentsNACollection of Document Data
Note: This is not currently populated and will be added in the future together with classification validation and splitting capabilities.
Classify Document
MetadataNAInformation about processing the documentActivities creating the Document Data object, receiving a file as input.
DOMNAThe document object model, used by all activities
TextNAAll extracted text
Detected LanguageNAThe language detected in the document
Split ConfidenceNAIf the document is split, the document is returned by the splitting model
Note: This is not currently populated and will be added in the future together with classification validation and splitting capabilities.
Classify Document
Results as Data TableNAFields exported as Data TableExtract Document Data

Passing Document Data to activities

When you use Document Data, the first output object is made from your input file. After you created this object, we recommend you to pass it along to your next activities. By passing it along to your next activities, you can reuse the Text and DOM from your original file. This approach saves you from re-digitizing the file each time.

Consuming the extraction results for single and multi-value fields

If you configure a document type field to be multi-valued, the system expects multiple values. An example might be a multiple-choice question on a form. The results appear in the multi-value attribute on the field, returned as a list. If the document type field is configured to be single value, the system returns the result in the value attribute on the field by default.

The following table shows you how Document Data returns single and multi-value fields:

Field typeHas no valueHas one valueHas two or more valuesDocumentData.Data.FieldName.ValueDocumentData.Data.FieldName.MultiValues
Single value YesNoN/A""null
Single valueNoYesN/A<value that was identified>null
Multi-valueYesNoNo""[] (empty array)
Multi-valueNoYesNo<value that was identified>[<array with one value identical to the .Value>]
Multi-valueNoNoYes<first value that was identified>[<array with n values, with the first value being identical to the .Value>]

Returning extracted fields as a Data Table

You can return the fields you extracted from a document as a Data Table, using the Document Data object. You can then use the Data Table variable inside Excel activities.

To return the extracted fields as a Data Table, choose the ResultsAsDatatable output for the Extract Document Data activity.

Was this page helpful?

Get The Help You Need
Learning RPA - Automation Courses
UiPath Community Forum
Uipath Logo White
Trust and Security
© 2005-2024 UiPath. All rights reserved.