activities
latest
false
UiPath logo, featuring letters U and I in white

Document Understanding Activities

Last updated Dec 5, 2024

Generative extractor - Good practices

Note:
  • For improved stability, the number of prompts is limited to maximum 50.
  • The response, extraction result, also called Completion, has a word limit of 700. This is limited to 700 words. This means that you can't extract more than 700 words from a single prompt. If your extraction requirements exceed this limit, you can divide the document into multiple pages, process them individually, and then merge the results afterwards.

Use precise language

Imagine asking four or five different people the question you want to ask the generative prompt. If you can imagine these people giving slightly different answers, then your language is too ambiguous and you need to rephrase to make it more precise.

For instance, if you give the prompt a general request, such as "Extract all the personal information of the Patient as comma separated key-value pairs", this expects the model to find certain information on its own.

Considering the previous request, the model has to figure out the following information on its own:
  • Where the personal information is in the document.
  • What is personal and what is not personal (which is very ambiguous).
  • What the user expects to get as "key", and what is the value for each key, and what is the exact format the user expects.
  • Should it use brackets? Or just each key-value pair on a separate line?
There are numerous steps, and a lot of different ways to answer the request, and because Generative AI is fundamentally non-deterministic, the longer the answer, the higher the probability that the answer will be different each time - even if the model temperature is set to zero.
To avoid the previously described issue, break your broad request, with a potentially long answer, into other simpler questions, which will generate a short answer. For example, you can break up your general prompt request into the following smaller requests:
  • "Extract the Patient First Name"
  • "Extract the Patient Last Name"
  • "Extract the Patient Street Address, including City, State and Zip Code"
  • "Extract the Patient Birth Date"
If you break your request into more smaller ones, you get higher accuracy, and much more consistent and reproducible results, with reduced need to parse long strings of text produced by the AI.

Specify an output format

To make your question more specific, ask the extractor to return the answer in a standardized format. This reduces ambiguity, increases response accuracy, and simplifies downstream processing.

For example, if you are asking the generative prompt to get a date, specify how you want the date returned: return date in yyyy-mm-dd format. If you just need the year, specify: return the year, as a four digit number.
You can also use this approach for numbers. For example, you may specify: return numbers which appear in parentheses as negative or return number in ##,###.## format to standardize the decimal separator and thousands separator for easier downstream processing.

Provide expected options

A special case of formatting is when the answer is one of a known set of possible answers.

For example, on an application form you may ask: What is the applicant’s marital status? Possible answers: Married, Unmarried, Separated, Divorced, Widowed, Other.

This not only simplifies downstream processing but also increases response accuracy.

Step by step

To maximize accuracy, break down complex questions into simple steps. Instead of asking What is the termination date of this contract?, you should ask First find termination section of contract, then determine termination date, then return date in yyyy-mm-dd format.
There are many ways to break this down. You may even write your request as a small computer program, such as the following:
Execute the following program:

1: Find termination section or clause

2: Find termination date

3: Return termination date in yyyy-mm-dd format

4: StopExecute the following program:

1: Find termination section or clause

2: Find termination date

3: Return termination date in yyyy-mm-dd format

4: Stop

Defining what you want in a programming style, potentially even using JSON or XML syntax, forces the Generative model to use its programming skills, which increases accuracy when following instructions.

Avoid arithmetic or logic problems

Do not ask the extractor to perform sums, multiplication, subtraction, comparisons, or any other arithmetic operation, because it makes basic mistakes, besides being very slow and expensive compared to a simple robot workflow, which will never make a mistake, and is much faster and cheaper.

Do not ask it to perform complex if-then-else type logic, for the same reason as above. The robot workflow is much more accurate and efficient with this kind of operations.

Tables

The Generative Extractor currently does not support column fields. Although you may be able to extract smaller tables through regular questions and parse their output, please note that this is only a workaround and comes with restrictions. It is neither designed nor recommended for extracting generic, arbitrarily large tables.

Extracting data from tables is a challenge for the Generative Extractor. The Generative AI technology operates on linear strings of text and does not understand visual two-dimensional information in images. It cannot extract table fields as defined in the Taxonomy Manager, but it can extract text and tables from documents.

To optimally extract data from tables, you can choose at least two approaches, including the following:
  • Ask the Generative Extractor to return columns separately, and then assemble the rows yourself in a workflow. You might ask: Please return the Unit Prices on this invoice, as a list from top to bottom, as a list in the format [<UnitPrice1>, <UnitPrice2>,…]
  • Ask it to return each row separately, as a JSON object. You might ask: Please return the line items of this invoice as an JSON array of JSON objects, each object in format: {"description”: <description>, “quantity”:<quantity>, “unit_price”:<unit price>, “amount”:<amount>}.

Confidence level

Generative AI models do not provide confidence levels for the predictions. However the goal is to detect errors, and confidence levels is just one possible way to achieve that goal, and not the best one. A much better and more reliable way to detect errors is to ask the same question in multiple different ways. The more different the question statement, the better. If all answers converge towards a common result, then the likelihood of an error is very low. If the answers disagree, then likelihood of error is high.

For instance, you may repeat the same question two, three, or even five times (depending on how crucial it is to avoid uncaught errors in your procedure), combining the aforementioned suggestions in varied combinations. If all the responses are consistent, human review may not be necessary. However, if any of the replies differ, manual review by a person in Action Center may be required.

Was this page helpful?

Get The Help You Need
Learning RPA - Automation Courses
UiPath Community Forum
Uipath Logo White
Trust and Security
© 2005-2024 UiPath. All rights reserved.