- Overview
- Document Processing Contracts
- About the Document Processing Contracts
- Box Class
- IPersistedActivity Interface
- PrettyBoxConverter Class
- IClassifierActivity Interface
- IClassifierCapabilitiesProvider Interface
- ClassifierDocumentType Class
- ClassifierResult Class
- ClassifierCodeActivity Class
- ClassifierNativeActivity Class
- ClassifierAsyncCodeActivity Class
- ClassifierDocumentTypeCapability Class
- ExtractorAsyncCodeActivity Class
- ExtractorCodeActivity Class
- ExtractorDocumentType Class
- ExtractorDocumentTypeCapabilities Class
- ExtractorFieldCapability Class
- ExtractorNativeActivity Class
- ExtractorResult Class
- ICapabilitiesProvider Interface
- IExtractorActivity Interface
- ExtractorPayload Class
- DocumentActionPriority Enum
- DocumentActionData Class
- DocumentActionStatus Enum
- DocumentActionType Enum
- DocumentClassificationActionData Class
- DocumentValidationActionData Class
- UserData Class
- Document Class
- DocumentSplittingResult Class
- DomExtensions Class
- Page Class
- PageSection Class
- Polygon Class
- PolygonConverter Class
- Metadata Class
- WordGroup Class
- Word Class
- ProcessingSource Enum
- ResultsTableCell Class
- ResultsTableValue Class
- ResultsTableColumnInfo Class
- ResultsTable Class
- Rotation Enum
- SectionType Enum
- WordGroupType Enum
- IDocumentTextProjection Interface
- ClassificationResult Class
- ExtractionResult Class
- ResultsDocument Class
- ResultsDocumentBounds Class
- ResultsDataPoint Class
- ResultsValue Class
- ResultsContentReference Class
- ResultsValueTokens Class
- ResultsDerivedField Class
- ResultsDataSource Enum
- ResultConstants Class
- SimpleFieldValue Class
- TableFieldValue Class
- DocumentGroup Class
- DocumentTaxonomy Class
- DocumentType Class
- Field Class
- FieldType Enum
- LanguageInfo Class
- MetadataEntry Class
- TextType Enum
- TypeField Class
- ITrackingActivity Interface
- ITrainableActivity Interface
- ITrainableClassifierActivity Interface
- ITrainableExtractorActivity Interface
- TrainableClassifierAsyncCodeActivity Class
- TrainableClassifierCodeActivity Class
- TrainableClassifierNativeActivity Class
- TrainableExtractorAsyncCodeActivity Class
- TrainableExtractorCodeActivity Class
- TrainableExtractorNativeActivity Class
- Document Understanding Digitizer
- Document Understanding ML
- Document Understanding OCR Local Server
- Document Understanding Process - Studio Template
- Document Understanding Activities
- About the Document Understanding Package
- Project Compatibility
- Set PDF Password
- Merge PDFs
- Get PDF Page Count
- Extract PDF Text
- Extract PDF Images
- Extract PDF Page Range
- Extract Document Data
- Create Validation Task and Wait
- Wait for Validation Task and Resume
- Create Validation Task
- Classify Document
- Create Classification Validation Task
- Create Classification Validation Task and Wait
- Wait for Classification Validation Task and Resume
- Intelligent OCR
- About the IntelligentOCR Activities Package
- Project Compatibility
- Load Taxonomy
- Digitize Document
- Classify Document Scope
- Keyword Based Classifier
- Intelligent Keyword Classifier
- Present Classification Station
- Create Document Classification Action
- Wait for Document Classification Action and Resume
- Train Classifiers Scope
- Keyword Based Classifier Trainer
- Intelligent Keyword Classifier Trainer
- Data Extraction Scope
- RegEx Based Extractor
- Form Extractor
- Intelligent Form Extractor
- Present Validation Station
- Create Document Validation Action
- Wait for Document Validation Action and Resume
- Train Extractors Scope
- Export Extraction Results
- ML Services
- OCR
- OCR Contracts
- Release Notes
- About the OCR Contracts
- Project Compatibility
- IOCRActivity Interface
- OCRAsyncCodeActivity Class
- OCRCodeActivity Class
- OCRNativeActivity Class
- Character Class
- OCRResult Class
- Word Class
- FontStyles Enum
- OCRRotation Enum
- OCRCapabilities Class
- OCRScrapeBase Class
- OCRScrapeFactory Class
- ScrapeControlBase Class
- ScrapeEngineUsages Enum
- ScrapeEngineBase
- ScrapeEngineFactory Class
- ScrapeEngineProvider Class
- OmniPage
- PDF
- [Unlisted] Abbyy
- [Unlisted] Abbyy Embedded
Extract Document Data
UiPath.IntelligentOCR.StudioWeb.Activities.ExtractDocumentDataWithDocumentData<UiPath.IntelligentOCR.StudioWeb.Activities.DataExtraction.ExtendedExtractionResultForDocumentData>
Extracts data from an input file or Document Data object, and stores the results into a Document Data object (either the one received as input or a newly created one for the input file).
The Extract Document Data activity requires an activity that precedes it which can provide a Document Data object (produced as output by other Document Understanding activities, for example Classify Document).
- Document Data - from the Classify Document activity
- File - from Get File/Folder or Get Newest Email activities
The supported languages for the generative models are the same as the used OCR engine used. For more information, check the OCR Supported languages page.
Project compatibility: Cross-platform
Properties
- Project - Requires you to select your Document Understanding project from the
drop-down menu. The available options are:
- Predefined - The default project
- You can create a custom project by going to Document Understanding.
- Extractor - Requires you to select the Extractor from the selected project. For
the Predefined Project, the available options are:
- Either one of the ML Packages found
hereNote: The Extract Document Data activity overrides the document type with the selected extractor. This is not applicable for generative models.
- Generative
- Either one of the ML Packages found
here
- Prompt - this field appears if you
choose the option Generative. Prompt to identify the fields to be extracted,
provided as key-value pairs, where the key represents the name of the field and the value
a description for it, helping the extractor identify the corresponding value. Click on the
field, and you will get a prompt with the following options, provided as pairs:
- Field name - Requires you to input the field name to be extracted (Ex. Due date) (30-character limit)
- Generative prompt - Requires you to provide the prompt as input for the Generative Extractor. (500-character limit)
Tip: For good practices on how to use generative prompts, check the Generative Extractor - Good Practices page. - Input - Requires you to specify the file itself, or Document Data, in case you have used other Document Understanding Activities before in your workflow, (for example, Classify Document).
Input
- Timeout (seconds) (Preview) - Maximum execution time (in seconds) for the call to the generative model. If the operation exceeds this timeout, it is automatically terminated to prevent delays or hangs. This property is only displayed if the Generative Extractor is selected as an extractor.
Output
- Document Data - All the extracted field
data from the file. Information can also be received from Classify Document.
In case of multi-value fields, all values are returned under Document Data. The values are available in
DocumentData.Data.FieldName.MultiValues[]
. If the MultiValues value isnull
, this means that the respective field is not a multi-value field. If the MultiValues property is an array (even if it is empty[]
), this means the respective field is a multi-value field.