- Introduction
- Balance
- Clusters
- Concept drift
- Coverage
- Datasets
- General fields (previously entities)
- Labels (predictions, confidence levels, hierarchy, etc.)
- Models
- Streams
- Model Rating
- Projects
- Precision
- Recall
- Annotated and unannotated messages
- Extraction Fields
- Sources
- Taxonomies
- Training
- True and false positive and negative predictions
- Validation
- Messages
- Access Control and Administration
- Manage sources and datasets
- Understanding the data structure and permissions
- Create or delete a data source in the GUI
- Uploading a CSV file into a source
- Preparing data for .CSV upload
- Create a new dataset
- Multilingual sources and datasets
- Enabling sentiment on a dataset
- Amend dataset settings
- Delete messages via the UI
- Delete a dataset
- Export a dataset
- Using Exchange Integrations
- Model training and maintenance
- Understanding labels, general fields, and metadata
- Label hierarchy and best practices
- Analytics vs. automation use cases
- Turning your objectives into labels
- Overview of the model training process
- Generative Annotation (NEW)
- Dastaset status
- Model training and annotating best practice
- Training with label sentiment analysis enabled
- Understanding data requirements
- Train
- Introduction to Refine
- Precision and recall explained
- Precision and recall
- How does Validation work?
- Understanding and improving model performance
- Why might a label have low average precision?
- Training using Check label and Missed label
- Training using Teach label (Refine)
- Training using Search (Refine)
- Understanding and increasing coverage
- Improving Balance and using Rebalance
- When to stop training your model
- Using general fields
- Generative extraction
- Using analytics and monitoring
- Automations and Communications Mining
- Licensing information
- FAQs and more

Communications Mining User Guide
Datasets
Following the IXP migration, the original Datasets page in Communications Mining has been replaced by the IXP homepage, which includes the Communications Mining datasets.
To get to the Datasets page, select the IXP service from Automation Cloud. The Datasets page will be displayed by default because the Communications Data capability, which includes Communication Mining, is preselected.
- View all datasets you have access to.
- Edit or delete datasets.
Note: You must have the Datasets admin permission assigned to edit or delete datasets.
- Navigate to other IXP capabilities. For more details on each capability, check the page in the IXP Overview guide.
Search for a specific dataset by name using the Search option.
Select a dataset from the list to access the Communications Mining page. This page allows you to handle your datasets through the following tabs: Train, Discover, Explore, Validation, Reports, Models, Streams, Settings.
When you create a new dataset, you can choose to make a carbon copy of an existing dataset. This means you copy over the same sources, general fields, sentiment selection, labels, and reviewed examples from the dataset you are copying.
Then, you can work on the copy dataset, which requires a different name, and make changes to it without impacting the original dataset.
We recommend copying an existing dataset for the following scenarios:
- You want to make major changes to your model, in terms of dataset structure for instance, and want to preserve the original dataset in case you want to revert back to it.
- You want to use the work already done by annotating the original dataset and create a new dataset to which you can add additional sources of a similar nature.
To copy an existing dataset from another, select the ellipsis next to a specific dataset from the homepage, and select Duplicate.
- Dataset Name
- API Name
- Project
- Model language - choose between English and Multilingual.
Each dataset has its own settings page, which contains useful information about that dataset. To access the Settings page, select the ellipsis next to a specific dataset, and then select Dataset Settings.
The page is split into the following tabs:
- Dataset - update the global settings of the dataset, including the title, description, and sources.
- Taxonomy - create, read, update, and delete labels, as well as their descriptions, extraction fields, general fields, and field types. You can also download the complete label taxonomy.
- Statistics - view annotating statistics and the message metadata properties.
- Select the ellipsis next to a specific dataset from the homepage, and then select Delete.
- Select the Delete dataset permanently option in the Settings tab.
After signing in, you are redirected to the Datasets page.
Alternatively, you can navigate to this page anytime by selecting the Communications Mining™ logo at the top of the page.
From the Datasets page you can:
- View all datasets you have access to.
- Edit or delete datasets.
Note: You must have the Datasets admin permission assigned to edit or delete datasets.
- Navigate to other pages in the platform.
Select one of the options listed on a dataset such as Explore, Train, or Reports to navigate straight to that dataset.
For the datasets you have access to, you can use the drop-down menu to filter to a specific project that you are part of. This helps to restrict the number of datasets that are displayed.
In addition, you can search for a specific dataset by name using the Search option.
Selecting a dataset
Each dataset card gives you some useful information on the dataset:
Each dataset card references:
- The dataset title and description
- The project the dataset is link to and the dataset name (project/name)
- The sources connected to the dataset
- The model family (language)
- If sentiment analysis is enabled
- When the dataset was last changed (and when it was created on hover)
Select Explore, Train and Reports beneath the dataset information card, to navigate to those pages.
Copy an existing dataset
- Select the Duplicate option if you want to copy an existing dataset from another dataset (this will auto-select the same sources, and sentiment selection as that dataset).
- Select all the (additional) sources that you want to connect to the dataset.
What does copying a dataset mean and why would you do it?
When you create a new dataset, you can choose to essentially create a carbon copy of a pre-existing dataset. This means that you copy over the same sources, general fields, sentiment selection, labels and reviewed examples as the dataset you've copied the taxonomy from.
You can then work on the copy dataset (which will require a different name) and make changes to it freely without impacting the original.
There are two main reasons why you want to do this:
- You want to make major changes to your model, in terms of dataset structure for instance, and want to preserve the original dataset in case you want to revert back to it
- You want to use the work already done by annotating the original dataset and create a new dataset to which you can add additional sources of a similar nature.
Dataset settings page
As well as the Datasets overview page, each dataset has its own individual settings page. This can be accessed by clicking into the dataset and going to 'Settings'.
A dataset’s settings page contains useful information about the dataset and is where you can perform various actions.
The page is split into three tabs:
- Dataset - where you can update the global settings of the dataset, including title, description and sources.
- Taxonomy - where you can create, read, update and delete labels and their descriptions, extraction fields, general fields, and field types. You can also download the label taxonomy in full.
- Statistics - where you can see annotating statistics and the message metadata properties.