- Getting Started
- Administration
- Manage Sources and Datasets
- Understanding the data structure and permissions
- Create a data source in the GUI
- Uploading a CSV file into a source
- Create a new dataset
- Multilingual sources and datasets
- Enabling sentiment on a dataset
- Amend a dataset's settings
- Delete messages via the UI
- Delete a dataset
- Export a dataset
- Using Exchange Integrations
- Preparing Data for .CSV Upload
- Model Training and Maintenance
- Understanding labels, entities and metadata
- Label hierarchy and best practice
- Defining your taxonomy objectives
- Analytics vs. automation use cases
- Turning your objectives into labels
- Building your taxonomy structure
- Taxonomy design best practice
- Importing your taxonomy
- Overview of the model training process
- Generative Annotation (NEW)
- Understanding the status of your dataset
- Model training and labelling best practice
- Training with label sentiment analysis enabled
- Train
- Introduction to 'Refine'
- Precision and recall explained
- Precision and recall
- How does Validation work?
- Understanding and improving model performance
- Why might a label have low average precision?
- Training using 'Check label' and 'Missed label'
- Training using Teach label (Refine)
- Training using Search (Refine)
- Understanding and increasing coverage
- Improving Balance and using 'Rebalance'
- When to stop training your model
- Using Analytics & Monitoring
- Automations and Communications Mining
- FAQs and More
Understanding entities
What are entities?
Entities are additional elements of structured data which can be extracted from within the messages in your dataset. Entities include data points such as monetary quantities, dates, currency codes, email addresses, URLs, as well as many other industry specific categories (see below for an example).
Unlike labels, the platform is able to predict most entities (except those trained from scratch) as soon as they are enabled, as it can identify them based on their typical, or in some instances very specific, format and a training set of similar entities.
Like labels, users are able to accept or reject entities that are correctly or incorrectly predicted, enhancing the model’s ability to identify them in future.
Types of entities
There are currently two main types of entities:
- Pre-trained entities that are typically based on a set of standard or custom-defined rules - e.g. Monetary Quantity, URL, and Date
- Entities trained from scratch by a user (like they would train labels) that are machine learning based
Trainable versus non-trainable entities
All entities are either 'trainable' by nature (entities trained from scratch), or can be made 'trainable' when they're enabled (all other entity kinds).
'Trainable' entities are those that will update live in the platform based on training provided by users. For more detail on training entities, see here.
If you enable training on a pre-trained entity that is typically based on a set of standard or custom-defined rules, you can refine the platform's understanding of that entity within the parameters of those rules. Essentially, further training on these will reduce the scope of what the platform can consider that entity, but not increase it.
This is because many of these entities, like dates (e.g. 'tomorrow') and monetary quantities (e.g. £20), need to be normalised into a structured data format for downstream systems. Also for entities like ISINs or CUSIPs, these must have a set format, so the platform should not be taught to predict anything that does not conform to their defined formats.
When any trainable entities are assigned, the platform looks at both the text of the entity, as well as the context of the entity within the rest of the communication, i.e. what is happening before and after the entity value (in the same paragraph, and the one above and below). It learns to better predict the entity based on the values themselves, as well as how the value appears within the context of the communication.
If a pre-trained entity is not set as trainable (see detail on enabling entities on a dataset here), users can still accept or reject the entity predictions they see in their dataset. These are updated and refined offline using this in-platform feedback provided by users. It’s therefore still helpful for users to accept or reject these entities when reviewing messages.