- Overview
- Building models
- Consuming models
- ML packages
- 1040 - ML package
- 1040 Schedule C - ML package
- 1040 Schedule D - ML package
- 1040 Schedule E - ML package
- 1040x - ML package
- 3949a - ML package
- 4506T - ML package
- 709 - ML package
- 9465 - ML package
- ACORD125 - ML package
- ACORD126 - ML package
- ACORD131 - ML package
- ACORD140 - ML package
- ACORD25 - ML package
- Bank Statements - ML package
- Bills Of Lading - ML package
- Certificate of Incorporation - ML package
- Certificate of Origin - ML package
- Checks - ML package
- Children Product Certificate - ML package
- CMS 1500 - ML package
- EU Declaration of Conformity - ML package
- Financial Statements - ML package
- FM1003 - ML package
- I9 - ML package
- ID Cards - ML package
- Invoices - ML package
- Invoices Australia - ML package
- Invoices China - ML package
- Invoices Hebrew - ML package
- Invoices India - ML package
- Invoices Japan - ML package
- Invoices Shipping - ML package
- Packing Lists - ML package
- Payslips - ML package
- Passports - ML package
- Purchase Orders - ML package
- Receipts - ML Package
- Remittance Advices - ML package
- UB04 - ML package
- Utility Bills - ML package
- Vehicle Titles - ML package
- W2 - ML package
- W9 - ML package
- Public endpoints
- Supported languages
- Data and security
- Licensing and Charging Logic
- How to
- Tables and group table rows
- Checkboxes and signatures
- Classify documents automatically
- Train a classifier
Document Understanding User Guide
Checkboxes and signatures
Checkboxes and signatures are two elements that play crucial roles in various types of documents, ranging from contractual agreements to registration forms. Understanding how to correctly annotate checkboxes and signatures is important in making the most out of your model.
- Mutually exclusive checkboxes.
- Non-mutually exclusive checkboxes, where you can select more than one option.
An important aspect to consider is the number of choices offered within a given multiple-choice field. In some cases there could be a single option, where the checkbox is either checked or not. However, in many instances, there may be 10, 20, or even more options, often organized into a grid or table format, which is common for health forms.
In terms of annotating these diverse multiple-choice fields, there are two primary methods you can use.
Let's use an example to understand how you can annotate the options.
This approach has the advantage that you have a single field, which requires less data. It also doesn't depend upon the successful detection of checkboxes. For example, if a checkbox is mistakenly detected as the letter X, the model can still learn to recognize that it signifies the selection of the option next to it.
However, a potential disadvantage is the necessity to ensure that both options are roughly equally represented, which might not always be the case. For instance, if 90% of the documents in your dataset have 2018 checked, the model's performance could be affected, leading to the failure of this approach. The problem gets worse when you have more options because some of them are almost always rare. In these cases you may need to create fake documents with the rare options checked to balance things out.
This approach also simplifies the annotation process and is less sensitive to checkbox detection errors. However, it may be more sensitive to unbalanced options.
Signatures can be identified using UiPath Document OCR, allowing ML models to detect them directly.
You can annotate a signature like any other field in your document. Once the signature is identified by UiPath Document OCR, the ML model learns to recognize the field as a signature.
At inference time, the signature will be retrieved as displayed in the documents. You then have to convert this into a boolean field (Yes/No) using RPA logic.