- Overview
- Model building
- Model validation
- Model deployment
- Consuming models
- API
- Frequently asked questions
Unstructured and complex documents user guide
This feature is available in preview in the Japan region.
Overview
The Layout model for extended languages is an intelligent pre-processing option in IXP that improves extraction accuracy. It is an optional normalization step used together with the main extraction model that helps improve how documents are represented before the LLM processes them — especially for documents where the position of information on the page matters, such as tables, forms, multi-column layouts, or checkbox-based documents.
Before the extraction model processes the content, the Layout model (extended languages) analyzes the visual structure of the document, including how text and other elements are arranged on the page. It is specifically designed to improve accuracy for documents that use extended character sets and complex layouts, where other pre-processing options may capture the structure of the document less accurately.
Benefits
The Layout model (extended languages) feature includes the following benefits:
- Structure-focused pre-processing – Improves how document layout is interpreted during extraction, without changing prompts or switching models. It is particularly effective when accuracy depends more on document structure, rather than the meaning of the content.
- More accurate row/column mapping – Preserves relationships where structure is critical.
- Enhanced checkbox extraction – More reliable detection and mapping of checkbox fields.
- Seamless workflow integration – Works within the existing IXP process for testing, validation, scoring, and version comparison.
Using the Layout model
To use the Layout model, proceed as follows:
- Navigate to the Build tab.
- Select Model configuration.
- Under Intelligent pre-processing, select Layout model (extended languages).
When to use the Layout model
Use this feature when the extraction accuracy with the None, Table model - mini, or Table model pre-processing options is lower than expected.
This feature is particularly effective in the following scenarios:
- Dense financial statements and reports — Brokerage statements, loan applications, service reports, and other documents where multiple rows, sections, and nested tables need to stay aligned.
- Checkbox-heavy forms — Insurance, healthcare, onboarding, and regulated forms with many adjacent or repeated checkboxes, where the main failure mode is correct checkbox-to-field mapping.
- Operational line-item documents — Packing lists, insertion orders, service orders, manifests, and similar documents where correct row recognition is more important than broad semantic understanding.
- Low-performing document sets — Document families that underperform with standard extraction, especially when errors stem from row/column or checkbox mismatches rather than instruction issues.
Example of Layout model (extended languages) pre-processing
The following image contains an example of an extraction querying the LLM without using the Layout model (extended languages). Single-Closing and No Cash Out are incorrectly extracted as selected by the applicant.
The following image contains an example of an extraction using the Layout model (extended languages), where the values from both fields are extracted correctly. No instruction change was applied.