- Getting started
- Balance
- Clusters
- Concept drift
- Coverage
- Datasets
- General fields (previously entities)
- Labels (predictions, confidence levels, hierarchy, etc.)
- Models
- Streams
- Model Rating
- Projects
- Precision
- Recall
- Reviewed and unreviewed messages
- Sources
- Taxonomies
- Training
- True and false positive and negative predictions
- Validation
- Messages
- Administration
- Manage sources and datasets
- Understanding the data structure and permissions
- Create a data source in the GUI
- Uploading a CSV file into a source
- Create a new dataset
- Multilingual sources and datasets
- Enabling sentiment on a dataset
- Amend a dataset's settings
- Delete messages via the UI
- Delete a dataset
- Export a dataset
- Using Exchange Integrations
- Preparing data for .CSV upload
- Model training and maintenance
- Understanding labels, general fields and metadata
- Label hierarchy and best practice
- Defining your taxonomy objectives
- Analytics vs. automation use cases
- Turning your objectives into labels
- Building your taxonomy structure
- Taxonomy design best practice
- Importing your taxonomy
- Overview of the model training process
- Generative Annotation (NEW)
- Dastaset status
- Model training and annotating best practice
- Training with label sentiment analysis enabled
- Train
- Introduction to Refine
- Precision and recall explained
- Precision and recall
- How does Validation work?
- Understanding and improving model performance
- Why might a label have low average precision?
- Training using Check label and Missed label
- Training using Teach label (Refine)
- Training using Search (Refine)
- Understanding and increasing coverage
- Improving Balance and using Rebalance
- When to stop training your model
- Pinning and tagging a model version
- Deleting a pinned model
- Adding new labels to existing taxonomies
- Maintaining a model in production
- Model rollback
- Using general fields
- Generative extraction
- Using analytics and monitoring
- Automations and Communications Mining
- Licensing information
- FAQs and more
Communications Mining User Guide
Maintaining a model in production
Why is model maintenance important?
Creating a model that's fit to be deployed into a production environment requires an investment of time that is quickly paid back by the value of ongoing analytics and efficiency savings through automation.
If a model is not effectively maintained over the long term, the benefits provided by the model can be eroded away over time as model performance can potentially decrease without a small amount of supplementary training.
This is due to ‘concept drift’, which refers to the situation where the concepts a model is trying to predict can change in unforeseen ways over time, making predictions less and less accurate.
This essentially relates to how, over time, things can change in a business and the way it communicates internally, with other businesses and with its customers. If your model's training data is no longer representative of the way your business operates today, it will perform worse when trying to identify concepts within your communications data.
It’s therefore important for any model that is being used in a production environment that it is effectively maintained to ensure continued high-performance.
How do you maintain a model in production?
Maintaining a production model is a straightforward and low effort process. The majority of effort required has already been put in to create the training data for your model before it's deployed.
There are two main approaches to maintaining a model, both of which ensure that your model is provided with additional helpful and representative training examples:
- Exception training
- Using 'Rebalance' mode
1. Exception training
Any model used for automation purposes should have an exception process in place that identifies which messages were exceptions that the platform could not confidently or correctly identify (see here for more detail).
This is important as it essentially allows you to quickly find and annotate the messages the platform struggled with, which improves the model's ability to predict future similar messages.
Typically, an automation process will be set up to automatically flag messages with a user property that identifies it as an exception. You can then filter in Explore to those messages and annotate them with the correct labels, to ensure that the platform can confidently and correctly identify similar messages in future.
This should form part of a regular process that aims to consistently improve the model. The more exceptions are captured and annotated, the better a model will perform over time, minimising the number of future exceptions and maximising the efficiency savings that an automation focused model enables.
2. Using Balance and 'Rebalance' mode
Your model's 'Balance' rating is a component part of its Model Rating. This is a reflection of how similar, i.e. representative, your model's training data is to the dataset as a whole.
In theory, if the most recent data being added to a dataset over time is significantly different to the older data that was used to train the model, this would cause a drop in the similarity score that determines your model's Balance rating.
When doing exception training, it's important to check if the similarity score for the model drops. If it does, this should be addressed as it could be an indication of concept drift and will mean performance in production will ultimately fall.
The simplest way to correct a drop in the similarity score is to complete some training using 'Rebalance' mode.
To ensure that you train the most recent data that's representative of the kind of communications being received today, you can also add a timestamp filter whilst training in 'Rebalance', either to the last 3 or 6 months. This ensures that your model is not solely relying on training data that is old and may not reflect any changes in your business.