- 概述
- 入门指南
- 构建模型
- 使用模型
- 模型详细信息
- 公共端点
- 1040 - 文档类型
- 1040 计划 C - 文档类型
- 1040 计划 D - 文档类型
- 1040 计划 E - 文档类型
- 1040x - 文档类型
- 3949a - 文档类型
- 4506T - 文档类型
- 709 - 文档类型
- 941x - 文档类型
- 9465 - 文档类型
- ACORD125 - 文档类型
- ACORD126 - 文档类型
- ACORD131 - 文档类型
- ACORD140 - 文档类型
- ACORD25 - 文档类型
- 银行对账单 - 文档类型
- 提单 - 文档类型
- 公司注册证书 - 文档类型
- 原产地证书 - 文档类型
- 支票 - 文档类型
- 儿童产品证书 - 文档类型
- CMS 1500 - 文档类型
- 欧盟符合性声明 - 文档类型
- 财务报表 - 文档类型
- FM1003 - 文档类型
- I9 - 文档类型
- 身份证 - 文档类型
- 发票 - 文档类型
- 发票 2 - 文档类型
- 澳大利亚发票 - 文档类型
- 发票中国 - 文档类型
- 希伯来语发票 - 文档类型
- 发票印度 - 文档类型
- 日本发票 - 文档类别
- 发票运输 - 文档类型
- 装箱单列表 - 文档类型
- 工资单 - 文档类型
- 护照 - 文档类型
- 采购订单 - 文档类型
- 收据 - 文档类型
- 收据 2 - 文档类型
- 日本收据 - 文档类型
- 汇款通知书 - 文档类型
- UB04 - 文档类型
- 美国抵押贷款平交披露 - 文档类型
- 公用事业账单 - 文档类型
- 车辆标题 - 文档类型
- W2 - 文档类型
- W9 - 文档类型
- 支持的语言
- Insights 仪表板
- 数据与安全性
- 日志记录
- 许可
- 如何
- 故障排除

Document Understanding 新式项目用户指南
When creating a new project, tenants located in Europe, the US and Japan may enable our new splitter and classifier model. This trainable model can automatically split and classify complex documents, enabling you to turn messy packets into clean, typed documents.
Follow the instructions from this page to a Document UnderstandingTM project and enable the new splitter and classifier model:
- 打开 Document Understanding。
- 选择“创建项目” 。
- 填写所需的项目名称。
- 选择“新式”以享受新式体验。
Note: This is a guided model building experience that also includes recommendations for optimal model performance and active learning.
- Switch on the Enable new splitter and classifier model toggle.
- 如果需要,请配置高级选项。
- Switch on the Enable splitting toggle to enable the model to automatically split documents into individual files before classification. You can also enable this feature from the Project settings screen.
Important: When the Enable splitting option is turned off, all documents are classified as a whole.
- 从“OCR 方法”下拉列表中选择要用于新项目的 OCR。
- 填写 OCR API 密钥。
注意:如果您选择 UiPath™ OCR,则系统会自动填充此字段。
- 填写 OCR URL。有关 UiPath OCR URL 的完整列表,请查看“公共端点”页面。
- 选择是否要在 PDF 上应用 OCR。默认情况下设置为“自动”。
- Switch on the Enable splitting toggle to enable the model to automatically split documents into individual files before classification. You can also enable this feature from the Project settings screen.
- 选择“创建”。
Figure 1. Figure 1. Creating your first project
成功创建项目后,您可以从“构建”部分上传文档。
Choose one of the two available options:
- Extract data from documents: pulls specific fields, such as invoice numbers, dates, total from your documents etc.
Note: We recommend choosing this option when you need structured data for automation or reporting.
- Classify and split documents: sorts documents by type and separates multiple documents within a single file.
Note: We recommend choosing this option when you need to organize and prepare documents for extraction.
- Select a document type.
- Select Upload or drag and drop your files inside the new document type. Wait for the upload to finish.
Certain complex files contain several document types. Our new model can detect where each sub-document starts and ends and classify each section accordingly.
-
Click on Classify and Split Documents and upload your document packets. Wait for the document to finish uploading and processing.
-
Select any documents from the upload section and click Split. This will open up the splitting annotation interface.
Note: If the project already has a trained model, uploaded documents will be pre-annotated using that model. This helps speed up annotation and allows you to view prediction results on new documents. -
Click New document type to create a document type for each item in your desired taxonomy. You can select a predefined document type or create a custom one.
Note: For custom document types, provide a name, a short description explaining its purpose, and comma-separated key indicators (such as unique fields or terms) that help identify it. -
Indicate where documents should be separated. Assign each page range to a document type using the dropdown menu. Once you have finished annotating the document, click Confirm.
Note: Clicking Confirm triggers document processing. After processing, each sub-document will appear under its corresponding document type in the Build section.Note: Each sub-document moved to a document type will get pre-annotated with the schema of the document type.
模型训练
Model training is triggered only after:
- At least five sub-documents have been created and annotated;
- A document has been confirmed.
The training status can be viewed in the upper right-hand corner of the Classification pane.
- The maximum document size is 160MB or 500 pages.
- Pages cannot be reordered or deleted.
Splitting and classification predictions
Whenever a new model is trained, all documents within the project receive predictions from the trained model. This allows you to review the performance of the classification model.
The “Type” column displays the ground truth, which is the document type as it was annotated. The “Predicted type” column shows the type predicted by the model.
By default, only the document packets are displayed in the UI. To view the sub-documents within each packet, click View and select the Include sub-documents checkbox.
Predictions can also be viewed in the annotation interface by enabling the “Show Prediction” toggle.