communications-mining
latest
false
- Getting started
- Balance
- Clusters
- Concept drift
- Coverage
- Datasets
- General fields (previously entities)
- Labels (predictions, confidence levels, hierarchy, etc.)
- Models
- Streams
- Model Rating
- Projects
- Precision
- Recall
- Reviewed and unreviewed messages
- Sources
- Taxonomies
- Training
- True and false positive and negative predictions
- Validation
- Messages
- Administration
- Manage sources and datasets
- Understanding the data structure and permissions
- Create a data source in the GUI
- Uploading a CSV file into a source
- Create a new dataset
- Multilingual sources and datasets
- Enabling sentiment on a dataset
- Amend a dataset's settings
- Delete messages via the UI
- Delete a dataset
- Delete a source
- Export a dataset
- Using Exchange Integrations
- Preparing data for .CSV upload
- Model training and maintenance
- Understanding labels, general fields and metadata
- Label hierarchy and best practice
- Defining your taxonomy objectives
- Analytics vs. automation use cases
- Turning your objectives into labels
- Building your taxonomy structure
- Taxonomy design best practice
- Importing your taxonomy
- Overview of the model training process
- Generative Annotation (NEW)
- Understanding the status of your dataset
- Model training and annotating best practice
- Training with label sentiment analysis enabled
- Train
- Introduction to Refine
- Precision and recall explained
- Precision and recall
- How does Validation work?
- Understanding and improving model performance
- Why might a label have low average precision?
- Training using Check label and Missed label
- Training using Teach label (Refine)
- Training using Search (Refine)
- Understanding and increasing coverage
- Improving Balance and using Rebalance
- When to stop training your model
- Using general fields
- Generative extraction
- Using analytics and monitoring
- Automations and Communications Mining
- Licensing information
- FAQs and more
Create a new dataset
Communications Mining User Guide
Last updated Nov 7, 2024
Create a new dataset
User permissions required: ‘Datasets admin’.
To create a new dataset:
Go to the datasets page and select New Dataset which reveals a modal to create the new dataset.
New Dataset modal
Complete the form with all the relevant information, then select Next to progress through each step:
- Add the title in the Dataset Name field, to provide more information on the dataset that you create.
- Give the dataset a descriptive name under the API Name field, using hyphens instead of spaces - e.g. zendesk-cs-chats.
- From the drop-down menu, select the Project that the dataset should be in. You can assign the dataset to any of the projects that you are a member of.
- Select an Existing source from the drop-down list or add a new one. To add a new data source:
- Tick the New source radio button.
- Enter the Source Name and API Name. You can't change the API name once added.
-
Note: You can add a new source only if you have the Source Admin permission.
-
Note: If you have the Tenant Admin permission, you can create a new project.Select the Create new option from the drop-down list:
- Add the new project's Title and API name, then select Save.
Note: Once you add a new project, you will automatically be set as the project founding user. This terminology will soon change to project owner, and you will have all the permissions in that project.
- Set the Model language(s)
- Confirm the model language, i.e. English that matches the language of your data. If you select a Multilingual model, see the Multilingual sources and datasets page for more details.
- Define labelsChoose a Dataset by selecting Import from a dataset option from the Import labels drop-down list. This copies labels and descriptions only from an existing dataset. To copy an entire dataset, select Duplicate from the Datasets page.
- Add Additional settings
- Add any pre-trained labels to your dataset. Some examples could include chaser, urgent, out of office, etc. You do not have to enable any during the dataset creation, and you can always enable them later in the dataset settings page as well.
- Set the sentiment and language(s) of the dataset:
- Enable or disable sentiment analysis - with sentiment analysis enabled every label in the taxonomy has an associated positive or negative sentiment, visit the Enabling sentiment on a dataset page, to understand why you would or wouldn't enable it.
Select Create to create the dataset.
Note:
- You can add up to 20 individual sources to a dataset in the GUI.
- Sources can sit in a different project to a dataset. As long as users have the appropriate permissions in each project, they will be able to see the messages and annotate as usual.
- If there are multiple sources in a dataset, they should share a similar intended purpose for your analysis.