- Démarrage
- Paramétrage et configuration
- Mappage des données
- Confidentialité des données

Clipboard AI user guide
Extracteurs de données
Les extracteurs de données peuvent être utilisés pour récupérer les informations appropriées à partir de différents documents et d'autres sources.
En matière de types de documents, il existe trois catégories principales :
- Structured documents - have a fixed format and are easy to process, guiding you to fill in the required data in precise fields. These documents are designed to comprise a certain type of data. Examples of structured documents: tax forms, surveys, questionnaires, etc.
- Semi-structured documents - have both a fixed format and variable parts. Semi-structured documents do not have a fixed format in the sense that they are not bound to specified data fields like structured documents, but they contain a predictable set of information, for example an invoice always contains a unique identifier, a date, or an invoice number but the placement might vary depending on the provider. These documents mainly contain label:value pairs and may contain paragraphs as well. Example of semi-structured documents: invoices, receipts, purchase orders, utility bills, etc.
- Unstructured documents - the information is not organized according to a fixed format. These documents mainly contain plain text, most of the data is in unstructured form inside the text. Examples of unstructured documents: contracts, emails, health records, etc.
Les extracteurs de données peuvent différer selon la façon dont ils extraient les données des documents. À ce titre, il existe deux types d’extracteurs :
- Fixed output extractors - trained to extract a predefined set of information from a document; for example the Invoice extractor always tries to extract the company name, address, total sum, etc.
- Question-answering extractors - trained to answer questions based on a given context. These extractors rely on natural language understanding to parse the text and figure out what is the exact value that needs to be extracted from the text and provide an appropriate answer or even choose an option out of a list of given options.
Clipboard AI uses the following set of data extractors:
- Extracteur universel
- Specific documents extractors
- Plain-text extractor
- Tables and name-value pairs extractor
L'extracteur universel
The Universal extractor is the default option to extract data from your documents. It scans your data (plain text or tabular) and decides the best solution to extract it. It uses a combination of the existing extractors and it also allows queries to find the best match in your data.
Learn how to interact with the Universal extractor.
Extracteurs de documents spécifiques
The Specific documents extractors are a fixed-output set of extractors trained on specific document types. Each document type is extracted using its corresponding Document Understanding machine learning model, as follows:
- Facture
- Passeport
- Reçu
- Carte d’identité
- Formulaire W-2
- Facture de services publics
- Bon de commande
- Formulaires Web/papier
Vous pouvez sélectionner le modèle Document Understanding préféré en fonction de votre type de document.
Extracteur de texte brut
The Plain-text extractor is a question-answering extractor that uses GPT3 to retrieve data from plain text documents, webpages, emails, etc. It can be used either for semi-structured documents to handle the variable parts or for unstructured documents where the layout is irrelevant.
Cet extracteur prend en charge la compréhension sémantique et, en plus de répondre aux questions, il dispose d'autres fonctionnalités avancées, comme la synthèse, la traduction automatique, la classification du type de document et la détection des sentiments.
Extracteur de tables et de paires nom-valeur
The Tables and name-value pairs extractor is a fixed output extractor which works best for documents containing Tables and Name:Value pairs.