- Getting started
- Balance
- Clusters
- Concept drift
- Coverage
- Datasets
- General fields (previously entities)
- Labels (predictions, confidence levels, hierarchy, etc.)
- Models
- Streams
- Model Rating
- Projects
- Precision
- Recall
- Reviewed and unreviewed messages
- Sources
- Taxonomies
- Training
- True and false positive and negative predictions
- Validation
- Messages
- Administration
- Manage sources and datasets
- Understanding the data structure and permissions
- Create a data source in the GUI
- Uploading a CSV file into a source
- Create a new dataset
- Multilingual sources and datasets
- Enabling sentiment on a dataset
- Amend a dataset's settings
- Delete messages via the UI
- Delete a dataset
- Export a dataset
- Using Exchange Integrations
- Preparing data for .CSV upload
- Model training and maintenance
- Understanding labels, general fields and metadata
- Label hierarchy and best practice
- Defining your taxonomy objectives
- Analytics vs. automation use cases
- Turning your objectives into labels
- Building your taxonomy structure
- Taxonomy design best practice
- Importing your taxonomy
- Overview of the model training process
- Generative Annotation (NEW)
- Dastaset status
- Model training and annotating best practice
- Training with label sentiment analysis enabled
- Train
- Introduction to Refine
- Precision and recall explained
- Precision and recall
- How does Validation work?
- Understanding and improving model performance
- Why might a label have low average precision?
- Training using Check label and Missed label
- Training using Teach label (Refine)
- Training using Search (Refine)
- Understanding and increasing coverage
- Improving Balance and using Rebalance
- When to stop training your model
- Using general fields
- Generative extraction
- Using analytics and monitoring
- Automations and Communications Mining
- Licensing information
- FAQs and more
Coverage
Coverage is a term frequently used in Machine Learning and relates to how well a model 'covers' the data it's used to analyse. In Communications Mining™, this relates to the proportion of messages in the dataset that have informative label predictions, and is presented in Validation as a percentage score.
'Informative labels' are those labels that the platform understands to be useful as standalone labels, by looking at how frequently they're assigned with other labels. Labels that are always assigned with another label, e.g. parent labels that are never assigned on their own or 'Urgent' if it's always assigned with another label, are down-weighted when the score is calculated.
The visual below gives an indication of what low coverage versus high coverage would look like across an entire dataset. Imagine the shaded circles are messages that have informative label predictions.
As a metric, coverage is a very helpful way of understanding if you've captured all of the different potential concepts in your dataset, and whether you've provided enough varied training examples for them so that the platform can effectively predict them.
In almost all cases, the higher a model's coverage is the better it performs, but it should not be considered in isolation when checking model performance.
It is also very important that the labels in the taxonomy are healthy, meaning that they have high average precision and no other performance warnings, and that the training data is a balanced representation of the dataset as a whole.
If your labels are unhealthy or the training data is not representative of the dataset, then the coverage of your model that the platform calculates will be unreliable.
Your model having high coverage is particularly important if you are using it to drive automated processes.
For more detail on model coverage, and how to check your model's coverage, see here.