- Getting started
- Balance
- Clusters
- Concept drift
- Coverage
- Datasets
- General fields (previously entities)
- Labels (predictions, confidence levels, hierarchy, etc.)
- Models
- Streams
- Model Rating
- Projects
- Precision
- Recall
- Reviewed and unreviewed messages
- Sources
- Taxonomies
- Training
- True and false positive and negative predictions
- Validation
- Messages
- Administration
- Manage sources and datasets
- Understanding the data structure and permissions
- Create a data source in the GUI
- Uploading a CSV file into a source
- Create a new dataset
- Multilingual sources and datasets
- Enabling sentiment on a dataset
- Amend a dataset's settings
- Delete messages via the UI
- Delete a dataset
- Export a dataset
- Using Exchange Integrations
- Preparing data for .CSV upload
- Model training and maintenance
- Understanding labels, general fields and metadata
- Label hierarchy and best practice
- Defining your taxonomy objectives
- Analytics vs. automation use cases
- Turning your objectives into labels
- Building your taxonomy structure
- Taxonomy design best practice
- Importing your taxonomy
- Overview of the model training process
- Generative Annotation (NEW)
- Dastaset status
- Model training and annotating best practice
- Training with label sentiment analysis enabled
- Train
- Overview
- Training using clusters
- Training using Search (Discover)
- Introduction to Refine
- Precision and recall explained
- Precision and recall
- How does Validation work?
- Understanding and improving model performance
- Why might a label have low average precision?
- Training using Check label and Missed label
- Training using Teach label (Refine)
- Training using Search (Refine)
- Understanding and increasing coverage
- Improving Balance and using Rebalance
- When to stop training your model
- Using general fields
- Generative extraction
- Using analytics and monitoring
- Automations and Communications Mining
- Licensing information
- FAQs and more
Communications Mining User Guide
Training using Search (Discover)
User permissions required: ‘View Sources’ AND ‘Review and annotate’.
The 'Search' functionality in Discover is used to search for key terms and phrases. You are able to search for exact search terms and if they exist it will show you these followed by partial matches. This function can be used to search for alternative terms and ways of expressing the same intent or concept for each label. This can be useful if you know a relevant common term or expression that has not appeared in any of the clusters so far and want to pin a couple of examples.
Search should not be used to apply a large number of examples per search term and per label - only a few of each.
Let’s look at an example. The cluster below is clearly about the location of the hotel, where a ‘Location’ label has been predicted. If we only used this term it could bias the model towards the phrases around the word ‘Location’ or similar, and we should use the Search feature to find alternative ways of expressing this:
Possible alternative search terms for 'Location':
- Located
- Convenient
- Position
- Proximity
- Near
- Hotel position
- Location to transport
- Transport links
- Tourist attractions
- Close to transport
- Central
- Close to airport
- Near the airport
Searching for different terms
The example below shows how searching for alternative terms for ‘Location’ highlights messages that are related to the location of the hotel but expressed differently. By doing this, the model will be given different examples of ‘Location’.
Applying labels to search results
- Select ‘Search’ from the ‘Cluster’ drop-down menu in the Discover tab
- Enter your search term and hit enter or click the search icon
- Matching search terms will appear highlighted in orange. The platform will show full matches followed by partial matches
- Add all labels that should apply, not just your Search results (e.g. Property > Staff label in the cluster above)
- DO NOT do this for large numbers of messages for each label
You can use this process sparingly for each label that has variable ways of expressing the same topic. However, there are other methods covered in the Explore phase that also help provide different training examples, but do not have the potential to bias your model.