- Overview
- Document Processing Contracts
- Release notes
- About the Document Processing Contracts
- Box Class
- IPersistedActivity interface
- PrettyBoxConverter Class
- IClassifierActivity Interface
- IClassifierCapabilitiesProvider Interface
- ClassifierDocumentType Class
- ClassifierResult Class
- ClassifierCodeActivity Class
- ClassifierNativeActivity Class
- ClassifierAsyncCodeActivity Class
- ClassifierDocumentTypeCapability Class
- ExtractorAsyncCodeActivity Class
- ExtractorCodeActivity Class
- ExtractorDocumentType Class
- ExtractorDocumentTypeCapabilities Class
- ExtractorFieldCapability Class
- ExtractorNativeActivity Class
- ExtractorResult Class
- ICapabilitiesProvider Interface
- IExtractorActivity Interface
- ExtractorPayload Class
- DocumentActionPriority Enum
- DocumentActionData Class
- DocumentActionStatus Enum
- DocumentActionType Enum
- DocumentClassificationActionData Class
- DocumentValidationActionData Class
- UserData Class
- Document Class
- DocumentSplittingResult Class
- DomExtensions Class
- Page Class
- PageSection Class
- Polygon Class
- PolygonConverter Class
- Metadata Class
- WordGroup Class
- Word Class
- ProcessingSource Enum
- ResultsTableCell Class
- ResultsTableValue Class
- ResultsTableColumnInfo Class
- ResultsTable Class
- Rotation Enum
- SectionType Enum
- WordGroupType Enum
- IDocumentTextProjection Interface
- ClassificationResult Class
- ExtractionResult Class
- ResultsDocument Class
- ResultsDocumentBounds Class
- ResultsDataPoint Class
- ResultsValue Class
- ResultsContentReference Class
- ResultsValueTokens Class
- ResultsDerivedField Class
- ResultsDataSource Enum
- ResultConstants Class
- SimpleFieldValue Class
- TableFieldValue Class
- DocumentGroup Class
- DocumentTaxonomy Class
- DocumentType Class
- Field Class
- FieldType Enum
- LanguageInfo Class
- MetadataEntry Class
- TextType Enum
- TypeField Class
- ITrackingActivity Interface
- ITrainableActivity Interface
- ITrainableClassifierActivity Interface
- ITrainableExtractorActivity Interface
- TrainableClassifierAsyncCodeActivity Class
- TrainableClassifierCodeActivity Class
- TrainableClassifierNativeActivity Class
- TrainableExtractorAsyncCodeActivity Class
- TrainableExtractorCodeActivity Class
- TrainableExtractorNativeActivity Class
- Document Understanding Digitizer
- Document Understanding ML
- Document Understanding OCR Local Server
- Document Understanding
- Release notes
- About the Document Understanding activity package
- Project compatibility
- Set PDF Password
- Merge PDFs
- Get PDF Page Count
- Extract PDF Text
- Extract PDF Images
- Extract PDF Page Range
- Extract Document Data
- Create Validation Task and Wait
- Wait for Validation Task and Resume
- Create Validation Task
- Classify Document
- Create Classification Validation Task
- Create Classification Validation Task and Wait
- Wait for Classification Validation Task and Resume
- Create Pre-Hire in Workday Based on CV
- Generative extractor - Good practices
- Generative classifier - Good practices
- Intelligent OCR
- Release notes
- About the IntelligentOCR activity package
- Project compatibility
- Configuring Authentication
- Load Taxonomy
- Digitize Document
- Classify Document Scope
- Keyword Based Classifier
- Document Understanding Project Classifier
- Intelligent Keyword Classifier
- Create Document Classification Action
- Wait For Document Classification Action And Resume
- Train Classifiers Scope
- Keyword Based Classifier Trainer
- Intelligent Keyword Classifier Trainer
- Data Extraction Scope
- Document Understanding Project Extractor
- RegEx Based Extractor
- Form Extractor
- Intelligent Form Extractor
- Present Validation Station
- Create Document Validation Action
- Wait For Document Validation Action And Resume
- Train Extractors Scope
- Export Extraction Results
- ML Services
- OCR
- OCR Contracts
- Release notes
- About the OCR Contracts
- Project compatibility
- IOCRActivity Interface
- OCRAsyncCodeActivity Class
- OCRCodeActivity Class
- OCRNativeActivity Class
- Character Class
- OCRResult Class
- Word Class
- FontStyles Enum
- OCRRotation Enum
- OCRCapabilities Class
- OCRScrapeBase Class
- OCRScrapeFactory Class
- ScrapeControlBase Class
- ScrapeEngineUsages Enum
- ScrapeEngineBase
- ScrapeEngineFactory Class
- ScrapeEngineProvider Class
- OmniPage
- PDF
- [Unlisted] Abbyy
- [Unlisted] Abbyy Embedded
Document Understanding Activities
Generative extractor - Good practices
- For improved stability, the number of prompts is limited to maximum 50.
- The response, extraction result, also called Completion, has a word limit of 700. This is limited to 700 words. This means that you can't extract more than 700 words from a single prompt. If your extraction requirements exceed this limit, you can divide the document into multiple pages, process them individually, and then merge the results afterwards.
Imagine you are asking four or five different persons the question you would like to ask the generative prompt. If you can imagine these people giving slightly different answers, then your language is too ambiguous and you need to rephrase to make it more precise.
To make your question more specific, ask the extractor to return the answer in a standardized format. This reduces ambiguity, increases response accuracy, and simplifies downstream processing.
return date in yyyy-mm-dd
format
. If you just need the year, specify: return the year, as a
four digit number
.
return numbers which appear in parentheses as negative
or return number in ##,###.## format
to standardize the decimal
separator and thousands separator for easier downstream processing.
A special case of formatting is when the answer is one of a known set of possible answers.
What is the applicant’s marital status? Possible answers: Married,
Unmarried, Separated, Divorced, Widowed, Other.
This not only simplifies downstream processing but also increases response accuracy.
What is the termination date of this contract?
, you should
ask First find termination section of contract, then determine termination
date, then return date in yyyy-mm-dd format.
Execute the following program:
1: Find termination section or clause
2: Find termination date
3: Return termination date in yyyy-mm-dd format
4: Stop
Execute the following program:
1: Find termination section or clause
2: Find termination date
3: Return termination date in yyyy-mm-dd format
4: Stop
Defining what you want in a programming style, potentially even using JSON or XML syntax, forces the Generative model to use its programming skills, which increases accuracy when following instructions.
Do not ask the extractor to perform sums, multiplication, subtraction, comparisons, or any other arithmetic operation, because it makes basic mistakes, besides being very slow and expensive compared to a simple robot workflow, which will never make a mistake, and is much faster and cheaper.
Do not ask it to perform complex if-then-else type logic, for the same reason as above. The robot workflow is much more accurate and efficient with this kind of operations.
The Generative Extractor currently does not support column fields. Although you may be able to extract smaller tables through regular questions and parse their output, please note that this is only a workaround and comes with restrictions. It is neither designed nor recommended for extracting generic, arbitrarily large tables.
- One approach is to ask the
Generative extractor to return columns separately, and then assemble the
rows yourself in a workflow. In this case, you might ask:
Please return the Unit Prices on this invoice, as a list from top to bottom, as a list in the format [<UnitPrice1>, <UnitPrice2>,…]
- Another approach is to ask it
to return each row separately, as a JSON object. In this case, you might
ask:
Please return the line items of this invoice as an JSON array of JSON objects, each object in format: {"description”: <description>, “quantity”:<quantity>, “unit_price”:<unit price>, “amount”:<amount>}
.
Generative AI models do not provide confidence levels for the predictions. However the goal is to detect errors, and confidence levels is just one possible way to achieve that goal – and not the best one. A much better and more reliable way to detect errors is to ask the same question in multiple different ways. The more different the question statement, the better. If all answers converge towards a common result, then the likelihood of an error is very low. If the answers disagree, then likelihood of error is high.
For instance, you may repeat the same question two, three, or even five times (depending on how crucial it is to avoid uncaught errors in your procedure), combining the aforementioned suggestions in varied combinations. If all the responses are consistent, human review may not be necessary. However, if any of the replies differ, manual review by a person in Action Center may be required.