- 概要
- 基本情報
- モデルを構築する
- モデルを使用する
- ML パッケージ
- 1040 (米国の個人所得税申告書) - ML パッケージ
- 1040 Schedule C (米国の個人所得税申告書のスケジュール C) - ML パッケージ
- 1040 Schedule D (米国の個人所得税申告書のスケジュール D) - ML パッケージ
- 1040 Schedule E (米国の個人所得税申告書のスケジュール E) - ML パッケージ
- 1040x (米国の個人所得税修正申告書) - ML パッケージ
- 3949a - ML パッケージ
- 4506T (米国の納税申告証明依頼書) - ML パッケージ
- 709 (米国の贈与税申告書) - ML パッケージ
- 941x (米国の雇用主による四半期連邦税修正申告書) - ML パッケージ
- 9465 (米国の分割納付申請書) - ML パッケージ
- 990 (米国の所得税非課税団体申告書) - ML パッケージ
- ACORD125 (企業向け保険契約申込書) - ML パッケージ
- ACORD126 (企業総合賠償責任保険) - ML パッケージ
- ACORD131 (アンブレラ/エクセス保険) - ML パッケージ
- ACORD140 (商業保険申込書の財物補償条項) - ML パッケージ
- ACORD25 (賠償責任保険証明書) - ML パッケージ
- Bank Statements (銀行預金残高証明書) - ML パッケージ
- BillsOfLading (船荷証券) - ML パッケージ
- Certificate of Incorporation (会社存在証明書) - ML パッケージ
- Certificate of Origin (原産地証明書) - ML パッケージ
- Checks (小切手) - ML パッケージ
- Children's Product Certificate (子供向け製品証明書) - ML パッケージ
- CMS 1500 (米国の医療保険請求フォーム) - ML パッケージ
- EU Declaration of Conformity (EU 適合宣言書) - ML パッケージ
- Financial Statements (財務諸表) - ML パッケージ
- FM1003 (米国の統一住宅ローン申請書) - ML パッケージ
- I9 (米国の就労資格証明書) - ML パッケージ
- ID Cards (ID カード) - ML パッケージ
- Invoices (請求書) - ML パッケージ
- InvoicesAustralia (請求書 - オーストラリア) - ML パッケージ
- InvoicesChina (請求書 - 中国) - ML パッケージ
- Invoices Hebrew (請求書 - ヘブライ語) - ML パッケージ
- InvoicesIndia (請求書 - インド) - ML パッケージ
- InvoicesJapan (請求書 - 日本) - ML パッケージ
- Invoices Shipping (船積送り状) - ML パッケージ
- Packing Lists (梱包明細書) - ML パッケージ
- Payslips (給与明細) - ML パッケージ
- Passports (パスポート) - ML パッケージ
- Purchase Orders (発注書) - ML パッケージ
- Receipts (領収書) - ML パッケージ
- RemittanceAdvices (送金通知書) - ML パッケージ
- UB-04 (健康保険請求フォーム) - ML パッケージ
- Utility Bills (公共料金の請求書) - ML パッケージ
- Vehicle Titles (自動車の権利書) - ML パッケージ
- W2 (米国の源泉徴収票) - ML パッケージ
- W9 (米国の納税申告書) - ML パッケージ
- パブリック エンドポイント
- サポートされている言語
- データおよびセキュリティ
- ライセンスと請求ロジック
- 使い方
- Tables and group table rows
- チェックボックスと署名
- ドキュメントの自動分類
チェックボックスと署名
Checkboxes and signatures are two elements that play crucial roles in various types of documents, ranging from contractual agreements to registration forms. Understanding how to correctly annotate checkboxes and signatures is important in making the most out of your model.
- Mutually exclusive checkboxes.
- Non-mutually exclusive checkboxes, where you can select more than one option.
An important aspect to consider is the number of choices offered within a given multiple-choice field. In some cases there could be a single option, where the checkbox is either checked or not. However, in many instances, there may be 10, 20, or even more options, often organized into a grid or table format, which is common for health forms.
In terms of annotating these diverse multiple-choice fields, there are four primary methods you can use.
Let's use an example to understand how you can annotate the options.
This approach has the advantage that you have a single field, which requires less data. It also doesn't depend upon the successful detection of checkboxes. For example, if a checkbox is mistakenly detected as the letter X, the model can still learn to recognize that it signifies the selection of the option next to it.
However, a potential disadvantage is the necessity to ensure that both options are roughly equally represented, which might not always be the case. For instance, if 90% of the documents in your dataset have 2018 checked, the model's performance could be affected, leading to the failure of this approach. The problem gets worse when you have more options because some of them are almost always rare. In these cases you may need to create fake documents with the rare options checked to balance things out.
In the previous example, you might have created two distinct fields: one labelled 2018 where you consistently annotate the checkbox for that year, and another one labelled 2019 where you continuously annotate the checkbox for 2019, whether it's checked or not. This method's positive aspect is that balance becomes less critical; even if one choice is selected 90% of the time, the model can still learn to identify them because the checkboxes hold fixed positions.
The downside is that you have two fields instead of one. While this may not pose a considerable issue when dealing with two options, handling 10-20 options and consequently creating 10-20 fields rather than a single one can significantly complicate the annotation process. Additionally, this also leads to a more challenging model training process, requiring more training data.
Another drawback is the occasional incorrect detection of the checkbox, which can leads to the need of more complex logic in the workflow to manage all the returned X, V, or K characters. In some cases, the OCR might even merge the checkbox with the word next to it, like X2018, requiring an even more complex RPA logic to handle this situation.
Multi-value fields make it easier to annotate, and they are not affected by imbalances in checked options or by a wide variety of selections. However, these fields are still subject to the accuracy of checkbox detection and the potential risk of checkboxes being merged with adjoining options. OCR errors are very hard to defend against.
This approach also simplifies the annotation process and is less sensitive to checkbox detection errors. However, it may be more sensitive to unbalanced options.
All of these options may be appropriate in some situations. Initially, the first option is preferred. As the accuracy of the checkbox detection in UiPath® Document OCR has improved, options two and three are preferred.
Signatures can be identified using UiPath Document OCR, allowing ML models to detect them directly.
You can annotate a signature like any other field in your document. Once the signature is identified by UiPath Document OCR, the ML model learns to recognize the field as a signature.
At inference time, the signature will be retrieved as displayed in the documents. You then have to convert this into a boolean field (Yes/No) using RPA logic.