- Overview
- Document Processing Contracts
- Release notes
- About the Document Processing Contracts
- Box Class
- IPersistedActivity interface
- PrettyBoxConverter Class
- IClassifierActivity Interface
- IClassifierCapabilitiesProvider Interface
- ClassifierDocumentType Class
- ClassifierResult Class
- ClassifierCodeActivity Class
- ClassifierNativeActivity Class
- ClassifierAsyncCodeActivity Class
- ClassifierDocumentTypeCapability Class
- ExtractorAsyncCodeActivity Class
- ExtractorCodeActivity Class
- ExtractorDocumentType Class
- ExtractorDocumentTypeCapabilities Class
- ExtractorFieldCapability Class
- ExtractorNativeActivity Class
- ExtractorResult Class
- ICapabilitiesProvider Interface
- IExtractorActivity Interface
- ExtractorPayload Class
- DocumentActionPriority Enum
- DocumentActionData Class
- DocumentActionStatus Enum
- DocumentActionType Enum
- DocumentClassificationActionData Class
- DocumentValidationActionData Class
- UserData Class
- Document Class
- DocumentSplittingResult Class
- DomExtensions Class
- Page Class
- PageSection Class
- Polygon Class
- PolygonConverter Class
- Metadata Class
- WordGroup Class
- Word Class
- ProcessingSource Enum
- ResultsTableCell Class
- ResultsTableValue Class
- ResultsTableColumnInfo Class
- ResultsTable Class
- Rotation Enum
- SectionType Enum
- WordGroupType Enum
- IDocumentTextProjection Interface
- ClassificationResult Class
- ExtractionResult Class
- ResultsDocument Class
- ResultsDocumentBounds Class
- ResultsDataPoint Class
- ResultsValue Class
- ResultsContentReference Class
- ResultsValueTokens Class
- ResultsDerivedField Class
- ResultsDataSource Enum
- ResultConstants Class
- SimpleFieldValue Class
- TableFieldValue Class
- DocumentGroup Class
- DocumentTaxonomy Class
- DocumentType Class
- Field Class
- FieldType Enum
- LanguageInfo Class
- MetadataEntry Class
- TextType Enum
- TypeField Class
- ITrackingActivity Interface
- ITrainableActivity Interface
- ITrainableClassifierActivity Interface
- ITrainableExtractorActivity Interface
- TrainableClassifierAsyncCodeActivity Class
- TrainableClassifierCodeActivity Class
- TrainableClassifierNativeActivity Class
- TrainableExtractorAsyncCodeActivity Class
- TrainableExtractorCodeActivity Class
- TrainableExtractorNativeActivity Class
- Document Understanding Digitizer
- Document Understanding ML
- Document Understanding OCR Local Server
- Document Understanding
- Release notes
- About the Document Understanding activity package
- Project compatibility
- Set PDF Password
- Merge PDFs
- Get PDF Page Count
- Extract PDF Text
- Extract PDF Images
- Extract PDF Page Range
- Extract Document Data
- Create Validation Task and Wait
- Wait for Validation Task and Resume
- Create Validation Task
- Classify Document
- Create Classification Validation Task
- Create Classification Validation Task and Wait
- Wait for Classification Validation Task and Resume
- Intelligent OCR
- Release notes
- About the IntelligentOCR activity package
- Project compatibility
- Configuring Authentication
- Load Taxonomy
- Digitize Document
- Classify Document Scope
- Keyword Based Classifier
- Document Understanding Project Classifier
- Intelligent Keyword Classifier
- Create Document Classification Action
- Wait For Document Classification Action And Resume
- Train Classifiers Scope
- Keyword Based Classifier Trainer
- Intelligent Keyword Classifier Trainer
- Data Extraction Scope
- Document Understanding Project Extractor
- RegEx Based Extractor
- Form Extractor
- Intelligent Form Extractor
- Present Validation Station
- Create Document Validation Action
- Wait For Document Validation Action And Resume
- Train Extractors Scope
- Export Extraction Results
- ML Services
- OCR
- OCR Contracts
- Release notes
- About the OCR Contracts
- Project compatibility
- IOCRActivity Interface
- OCRAsyncCodeActivity Class
- OCRCodeActivity Class
- OCRNativeActivity Class
- Character Class
- OCRResult Class
- Word Class
- FontStyles Enum
- OCRRotation Enum
- OCRCapabilities Class
- OCRScrapeBase Class
- OCRScrapeFactory Class
- ScrapeControlBase Class
- ScrapeEngineUsages Enum
- ScrapeEngineBase
- ScrapeEngineFactory Class
- ScrapeEngineProvider Class
- OmniPage
- PDF
- [Unlisted] Abbyy
- [Unlisted] Abbyy Embedded
Generative extractor - Good practices
- For improved stability, the number of prompts is limited to maximum 50.
- The response, extraction result, also called Completion, has a word limit of 700. This is limited to 700 words. This means that you can't extract more than 700 words from a single prompt. If your extraction requirements exceed this limit, you can divide the document into multiple pages, process them individually, and then merge the results afterwards.
Imagine asking four or five different people the question you want to ask the generative prompt. If you can imagine these people giving slightly different answers, then your language is too ambiguous and you need to rephrase to make it more precise.
For instance, if you give the prompt a general request, such as "Extract all the personal information of the Patient as comma separated key-value pairs", this expects the model to find certain information on its own.
- Where the personal information is in the document.
- What is personal and what is not personal (which is very ambiguous).
- What the user expects to get as "key", and what is the value for each key, and what is the exact format the user expects.
- Should it use brackets? Or just each key-value pair on a separate line?
- "Extract the Patient First Name"
- "Extract the Patient Last Name"
- "Extract the Patient Street Address, including City, State and Zip Code"
- "Extract the Patient Birth Date"
To make your question more specific, ask the extractor to return the answer in a standardized format. This reduces ambiguity, increases response accuracy, and simplifies downstream processing.
return date in yyyy-mm-dd
format
. If you just need the year, specify: return the year, as a four
digit number
.
return numbers which appear in parentheses as negative
or
return number in ##,###.## format
to standardize the decimal separator and
thousands separator for easier downstream processing.
A special case of formatting is when the answer is one of a known set of possible answers.
What
is the applicant’s marital status? Possible answers: Married, Unmarried, Separated,
Divorced, Widowed, Other.
This not only simplifies downstream processing but also increases response accuracy.
What is the termination date of this contract?
, you should ask
First find termination section of contract, then determine termination date, then
return date in yyyy-mm-dd format.
Execute the following program:
1: Find termination section or clause
2: Find termination date
3: Return termination date in yyyy-mm-dd format
4: Stop
Execute the following program:
1: Find termination section or clause
2: Find termination date
3: Return termination date in yyyy-mm-dd format
4: Stop
Defining what you want in a programming style, potentially even using JSON or XML syntax, forces the Generative model to use its programming skills, which increases accuracy when following instructions.
Do not ask the extractor to perform sums, multiplication, subtraction, comparisons, or any other arithmetic operation, because it makes basic mistakes, besides being very slow and expensive compared to a simple robot workflow, which will never make a mistake, and is much faster and cheaper.
Do not ask it to perform complex if-then-else type logic, for the same reason as above. The robot workflow is much more accurate and efficient with this kind of operations.
The Generative Extractor currently does not support column fields. Although you may be able to extract smaller tables through regular questions and parse their output, please note that this is only a workaround and comes with restrictions. It is neither designed nor recommended for extracting generic, arbitrarily large tables.
Extracting data from tables is a challenge for the Generative Extractor. The Generative AI technology operates on linear strings of text and does not understand visual two-dimensional information in images. It cannot extract table fields as defined in the Taxonomy Manager, but it can extract text and tables from documents.
- Ask the Generative Extractor to return
columns separately, and then assemble the rows yourself in a workflow. You might ask:
Please return the Unit Prices on this invoice, as a list from top to bottom, as a list in the format [<UnitPrice1>, <UnitPrice2>,…]
- Ask it to return each row separately,
as a JSON object. You might ask:
Please return the line items of this invoice as an JSON array of JSON objects, each object in format: {"description”: <description>, “quantity”:<quantity>, “unit_price”:<unit price>, “amount”:<amount>}
.
Generative AI models do not provide confidence levels for the predictions. However the goal is to detect errors, and confidence levels is just one possible way to achieve that goal, and not the best one. A much better and more reliable way to detect errors is to ask the same question in multiple different ways. The more different the question statement, the better. If all answers converge towards a common result, then the likelihood of an error is very low. If the answers disagree, then likelihood of error is high.
For instance, you may repeat the same question two, three, or even five times (depending on how crucial it is to avoid uncaught errors in your procedure), combining the aforementioned suggestions in varied combinations. If all the responses are consistent, human review may not be necessary. However, if any of the replies differ, manual review by a person in Action Center may be required.