document-understanding
latest
false
UiPath logo, featuring letters U and I in white

Document Understanding User Guide

Automation CloudAutomation Cloud Public SectorAutomation SuiteStandalone
Last updated Nov 7, 2024

Document types

A document type is the definition of a logical type of document that different business processes must handle.

What is a document type and what can it contain?

Document types include invoices, medical records, IRS Forms W-2, contracts, and others. A document type, besides a name, group, and category, usually contains a collection of fields.

For example, invoices usually contain the following information:
  • Vendor name, vendor address, billing name, billing address
  • Invoice number, purchase order number, payment terms, due date
  • Net amount, tax amount, discount, total amount
  • VAT number, VAT rate
  • Bank account number, bank name, SWIFT, IBAN
Figure 1. Invoice example

Document type formats

Document types can be classified based on their format. Some document types have very structured content, while others mainly consist of free text.

Documents are classified into three main formats:
  • Structured
  • Semi-structured
  • Unstructured
Note: Documents can often be a combination of these three categories. A file can have a structured heading, followed by an unstructured, free-form content. They can also contain unstructured content with specific information that always appears in a very structured or repeating context.

Structured documents

Structured documents include:
  • Surveys
  • Questionnaires
  • Tax forms
  • Passports
  • Licenses
  • Time sheets

These documents are designed to collect information in a specific format. They typically contain key-value pairs, tables, handwritten text, signatures, and checkboxes. These documents guide the user by providing precise areas for entering each piece of data. Such documents are commonly called forms and are used to collect low-diversity data.

Figure 2. Driver license, an example of a structured document docs image

Semi-structured documents

Semi-structured documents are documents that do not follow a strict format like structured forms and are not bound to specified data fields. These don't have a fixed form but follow a common enough format. They contain fixed and variable parts, like tables. They may contain paragraphs as well, but data is mainly found in key-value pairs. Semi-structured documents include:
  • Invoices
  • Receipts
  • Purchase orders
  • Healthcare lab reports
  • Bank statements
  • Utility bills
Figure 3. Invoice, an example of a semi-structured document docs image

Unstructured documents

Unstructured documents are files that do not follow a specific or organized model. They do not have a fixed format, and the information they contain is often presented in an unstructured manner, making it difficult for robots to process. While humans can easily understand these documents, the data can be challenging for machines to interpret. Unstructured documents can take many forms, including:
  • Contracts
  • Leases
  • Annual reports
  • Agreements
  • News articles
Figure 4. License agreement, an example of an unstructured document docs image
  • What is a document type and what can it contain?
  • Document type formats
  • Structured documents
  • Semi-structured documents
  • Unstructured documents

Was this page helpful?

Get The Help You Need
Learning RPA - Automation Courses
UiPath Community Forum
Uipath Logo White
Trust and Security
© 2005-2024 UiPath. All rights reserved.