Activities
latest
false
- Overview
- Document Processing Contracts
- About the Document Processing Contracts
- Box Class
- IPersistedActivity Interface
- PrettyBoxConverter Class
- IClassifierActivity Interface
- IClassifierCapabilitiesProvider Interface
- ClassifierDocumentType Class
- ClassifierResult Class
- ClassifierCodeActivity Class
- ClassifierNativeActivity Class
- ClassifierAsyncCodeActivity Class
- ClassifierDocumentTypeCapability Class
- ExtractorAsyncCodeActivity Class
- ExtractorCodeActivity Class
- ExtractorDocumentType Class
- ExtractorDocumentTypeCapabilities Class
- ExtractorFieldCapability Class
- ExtractorNativeActivity Class
- ExtractorResult Class
- ICapabilitiesProvider Interface
- IExtractorActivity Interface
- ExtractorPayload Class
- DocumentActionPriority Enum
- DocumentActionData Class
- DocumentActionStatus Enum
- DocumentActionType Enum
- DocumentClassificationActionData Class
- DocumentValidationActionData Class
- UserData Class
- Document Class
- DocumentSplittingResult Class
- DomExtensions Class
- Page Class
- PageSection Class
- Polygon Class
- PolygonConverter Class
- Metadata Class
- WordGroup Class
- Word Class
- ProcessingSource Enum
- ResultsTableCell Class
- ResultsTableValue Class
- ResultsTableColumnInfo Class
- ResultsTable Class
- Rotation Enum
- SectionType Enum
- WordGroupType Enum
- IDocumentTextProjection Interface
- ClassificationResult Class
- ExtractionResult Class
- ResultsDocument Class
- ResultsDocumentBounds Class
- ResultsDataPoint Class
- ResultsValue Class
- ResultsContentReference Class
- ResultsValueTokens Class
- ResultsDerivedField Class
- ResultsDataSource Enum
- ResultConstants Class
- SimpleFieldValue Class
- TableFieldValue Class
- DocumentGroup Class
- DocumentTaxonomy Class
- DocumentType Class
- Field Class
- FieldType Enum
- LanguageInfo Class
- MetadataEntry Class
- TextType Enum
- TypeField Class
- ITrackingActivity Interface
- ITrainableActivity Interface
- ITrainableClassifierActivity Interface
- ITrainableExtractorActivity Interface
- TrainableClassifierAsyncCodeActivity Class
- TrainableClassifierCodeActivity Class
- TrainableClassifierNativeActivity Class
- TrainableExtractorAsyncCodeActivity Class
- TrainableExtractorCodeActivity Class
- TrainableExtractorNativeActivity Class
- Document Understanding Digitizer
- Document Understanding ML
- Document Understanding OCR Local Server
- Document Understanding Process - Studio Template
- Document Understanding Activities
- About the Document Understanding Package
- Project Compatibility
- Set PDF Password
- Merge PDFs
- Get PDF Page Count
- Extract PDF Text
- Extract PDF Images
- Extract PDF Page Range
- Extract Document Data
- Create Validation Task and Wait
- Wait for Validation Task and Resume
- Create Validation Task
- Classify Document
- Create Classification Validation Task
- Create Classification Validation Task and Wait
- Wait for Classification Validation Task and Resume
- Intelligent OCR
- About the IntelligentOCR Activities Package
- Project Compatibility
- Load Taxonomy
- Digitize Document
- Classify Document Scope
- Keyword Based Classifier
- Intelligent Keyword Classifier
- Present Classification Station
- Create Document Classification Action
- Wait for Document Classification Action and Resume
- Train Classifiers Scope
- Keyword Based Classifier Trainer
- Intelligent Keyword Classifier Trainer
- Data Extraction Scope
- RegEx Based Extractor
- Form Extractor
- Intelligent Form Extractor
- Present Validation Station
- Create Document Validation Action
- Wait for Document Validation Action and Resume
- Train Extractors Scope
- Export Extraction Results
- ML Services
- OCR
- OCR Contracts
- Release Notes
- About the OCR Contracts
- Project Compatibility
- IOCRActivity Interface
- OCRAsyncCodeActivity Class
- OCRCodeActivity Class
- OCRNativeActivity Class
- Character Class
- OCRResult Class
- Word Class
- FontStyles Enum
- OCRRotation Enum
- OCRCapabilities Class
- OCRScrapeBase Class
- OCRScrapeFactory Class
- ScrapeControlBase Class
- ScrapeEngineUsages Enum
- ScrapeEngineBase
- ScrapeEngineFactory Class
- ScrapeEngineProvider Class
- OmniPage
- PDF
- [Unlisted] Abbyy
- [Unlisted] Abbyy Embedded
Document Data
Document Understanding Activities
Last updated Apr 10, 2024
Document Data
To efficiently work with documents, the Document Data object is used as input
or output to activities part of the UiPath.DocumentUnderstanding.Activities
package, containing all the information about the document. It may contain the
following:
- Document type, populated by the Classify Document activity.
- Data (fields), populated by the Extract Document Data activity.
- Text and Document Oject Model, populated by the first UiPath.DocumentUnderstanding.Activities activity of the workflow, processing the inut file - used by all the other subsequent activities.
- Other properties which may come in handy when implementing automations.
The object contains all information about the processed document, gathered into one resource.
Tip: Unless an activity is the first Document
Understanding activity part of a Studio workflow, use Document Data as input.
Use the File variable as input only if the activity is the first Document
Understanding one part of a Studio workflow.
The properties of the Document Data variable can be populated and consumed by one or multiple activities. Depending on the activity populating the variable, the properties can differ.
Note: The following changes are applicable for preview releases
starting with the v2.5.0-preview release:
-
The Name property from the Document Type attribute is replaced with the following:
- DisplayName for custom models
- ID for out-of-the-box models
- Two new properties are added, populated from the result of the Document
Understanding framework:
- ID
- DisplayName
Attribute name | Property | Description | Activities populating the value |
---|---|---|---|
Document Type | Name | Name of the Document Type | Classify Document |
Confidence | Classification confidence | ||
URL | URL of where the Document Type is accessible; this can be either custom or predefined, referenced via the respective project in Document Understanding center. | ||
Fields | Field Value | Extraction value of the field | |
Extraction Confidence Score | Confidence score of the extraction, as provided by the model | ||
OCR Confidence Score | Confidence score provided by the OCR engine | ||
File Details | FullName | Full name of the file | Activities creating the Document Data object, receiving a file as input |
Extension | Extension of the file | ||
Page Range | Page range of the file | ||
Sub-Documents | - | Collection of Document Data
Note: This is
not currently populated and will be added in the future together
with classification validation and splitting
capabilities.
| Classify Document |
Metadata | - | Information about processing the document | Activities creating the Document Data object, receiving a file as input |
DOM | - | The document object model, used by all activities | |
Text | - | All extracted text | |
Detected Language | - | The language detected in the document | |
Split Confidence | - | If the document is split, the document is returned by the
splitting model
Note: This is not currently
populated and will be added in the future together with
classification validation and splitting
capabilities.
| Classify Document |
Results as Data Table | - | Fields exported as Data Table | Extract Document Data |