- Overview
- Building models
- Consuming models
- ML packages
- 1040 - ML package
- 1040 Schedule C - ML package
- 1040 Schedule D - ML package
- 1040 Schedule E - ML package
- 1040x - ML package
- 3949a - ML package
- 4506T - ML package
- 709 - ML package
- 941x - ML package
- 9465 - ML package
- ACORD131 - ML package
- ACORD140 - ML package
- ACORD25 - ML package
- Bank Statements - ML package
- Bills Of Lading - ML package
- Certificate of Incorporation - ML package
- Certificate of Origin - ML package
- Checks - ML package
- Children Product Certificate - ML package
- CMS 1500 - ML package
- EU Declaration of Conformity - ML package
- Financial Statements - ML package
- FM1003 - ML package
- I9 - ML package
- ID Cards - ML package
- Invoices - ML package
- Invoices Australia - ML package
- Invoices China - ML package
- Invoices Hebrew - ML package
- Invoices India - ML package
- Invoices Japan - ML package
- Invoices Shipping - ML package
- Packing Lists - ML package
- Payslips - ML package
- Passports - ML package
- Purchase Orders - ML package
- Receipts - ML Package
- Remittance Advices - ML package
- UB04 - ML package
- Utility Bills - ML package
- Vehicle Titles - ML package
- W2 - ML package
- W9 - ML package
- Public endpoints
- Supported languages
- Data and security
- Licensing and Charging Logic
- How to
Annotate documents
After successfully creating your project and uploading your documents to a specific document type, they are automatically pre-annotated. This is done using specialized models, based on the document type's schema. The schema clearly defines the fields you want to extract from a particular document type. To find the document type's schema, go to the Annotation page and check the Fields section.
Pre-annotations are indicated with underlines on the text within the document and they can't be deleted. If they are incorrect and cannot be matched to a particular field, you can ignore them. During the training process, only confirmed fields are used for training, while the underlines are not taken into account.
As you continue to add more annotations, the pre-annotated underlines should progressively align with your input. There may be a few inconsistencies between underlines and user-annotated fields at the beginning. However, as you make more annotations and the model improves, the underlines should line up more precisely with the user-supplied data.
In the following image, the Shipping Address has been incorrectly pre-annotated to include the person's name.
To fix this, you only need to confirm the Shipping Address. It's not necessary to remove the underlined text related to the name. As you continue with your annotation and correct such errors, the occasions when the underlined text doesn't align with the confirmed field should decrease.
- Custom document types are not automatically annotated. You need to manually pre-annotate documents that are a custom document type.
- To trigger model training, a minimum of 40 operations is needed. For example, if you have 20 documents, you would need to annotate at least 2 fields per document, resulting in a total of 40 operations.
After all documents are uploaded and pre-annotated, your goal is to either validate or modify the pre-annotated fields. For a document where all fields are accurately pre-annotated, select Confirm to approve all fields at once. A document, once confirmed, will be signified with a green shield symbol in the document list.
If a document is only partially confirmed, it will be marked with an empty shield symbol in the document list. This symbolizes that the annotation process for this particular document is In Progress. Your end aim should be to make sure that all documents are Confirmed.
- Pre-annotation is correct and should be validated.
- Pre-annotation is not correct and the field is present on the document.
- Pre-annotation is not correct and the field is missing from the document.
- There is no pre-annotation.
If the pre-annotation is incorrect, choose the right text and field and select Confirm.
You can change the document type settings from the Annotate view.
To do so, click on the three-dot icon ⁝ on the right side of the document type name and select Settings.
- Base model: Dataset size estimations used in the Recommended Actions depend on the base model used to train. Using the most similar base model to your Document Type will reduce the amount of annotation work required.
- Number of languages: Dataset size estimation used in the Recommended Actions depend on the number of languages in the dataset. More languages generally require annotating more data.