communications-mining
latest
false
- Getting started
- Balance
- Clusters
- Concept drift
- Coverage
- Datasets
- General fields (previously entities)
- Labels (predictions, confidence levels, hierarchy, etc.)
- Models
- Streams
- Model Rating
- Projects
- Precision
- Recall
- Reviewed and unreviewed messages
- Sources
- Taxonomies
- Training
- True and false positive and negative predictions
- Validation
- Messages
- Administration
- Manage sources and datasets
- Understanding the data structure and permissions
- Create a data source in the GUI
- Uploading a CSV file into a source
- Create a new dataset
- Multilingual sources and datasets
- Enabling sentiment on a dataset
- Amend a dataset's settings
- Delete messages via the UI
- Delete a dataset
- Delete a source
- Export a dataset
- Using Exchange Integrations
- Preparing data for .CSV upload
- Model training and maintenance
- Understanding labels, general fields and metadata
- Label hierarchy and best practice
- Defining your taxonomy objectives
- Analytics vs. automation use cases
- Turning your objectives into labels
- Building your taxonomy structure
- Taxonomy design best practice
- Importing your taxonomy
- Overview of the model training process
- Generative Annotation (NEW)
- Understanding the status of your dataset
- Model training and annotating best practice
- Training with label sentiment analysis enabled
- Train
- Introduction to Refine
- Precision and recall explained
- Precision and recall
- How does Validation work?
- Understanding and improving model performance
- Why might a label have low average precision?
- Training using Check label and Missed label
- Training using Teach label (Refine)
- Training using Search (Refine)
- Understanding and increasing coverage
- Improving Balance and using Rebalance
- When to stop training your model
- Using general fields
- Generative extraction
- Using analytics and monitoring
- Automations and Communications Mining
- Licensing information
- FAQs and more
Taxonomy design best practice
Communications Mining User Guide
Last updated Nov 7, 2024
Taxonomy design best practice
We recommend following these best practices to structure your taxonomy properly and ensure high model performance:
- Objectives alignment: Make sure each label serves a specific business purpose and is aligned to your defined objectives.
- Distinct: It’s important that each label is specific in what it's trying to capture and doesn’t overlap with other labels.
- Specific: Avoid using broad, vague, or confused concepts as they are more likely to perform badly and less likely to provide business value. Try to split broad labels out into multiple distinct labels, if possible. It’s better to go too specific with labels initially (i.e. more levels of hierarchy) and merge them up later if needed, as opposed to having to break down very broad labels manually.
- Identifiable: Ensure each label is clearly identifiable from the text of the messages that it’s applied to.
- Parent label: Use a parent label if you expect to have a significant number of other similar concepts related to this broader topic.
- Child label: Make sure that every label nested under another label is a subset of that label.
- Hierarchy levels: In general, try not to add more than four levels of hierarchy as the model becomes increasingly complex to train.
- Label name: Don't spend too much time thinking of the perfect label name as labels can be always renamed later.
- Label description: Add label descriptions to your labels (by accessing Labels & General Fields in Settings) to ensure annotating consistency, which is particularly helpful if you have several people training the model.
- Uninformative: Create some non-value adding labels, e.g. thank-you emails, so you can tell the platform what is / isn’t important to analyse.