- Getting started
- Balance
- Clusters
- Concept drift
- Coverage
- Datasets
- General fields (previously entities)
- Labels (predictions, confidence levels, hierarchy, etc.)
- Models
- Streams
- Model Rating
- Projects
- Precision
- Recall
- Reviewed and unreviewed messages
- Sources
- Taxonomies
- Training
- True and false positive and negative predictions
- Validation
- Messages
- Administration
- Manage sources and datasets
- Understanding the data structure and permissions
- Create a data source in the GUI
- Uploading a CSV file into a source
- Create a new dataset
- Multilingual sources and datasets
- Enabling sentiment on a dataset
- Amend a dataset's settings
- Delete messages via the UI
- Delete a dataset
- Delete a source
- Export a dataset
- Using Exchange Integrations
- Preparing data for .CSV upload
- Model training and maintenance
- Understanding labels, general fields and metadata
- Label hierarchy and best practice
- Defining your taxonomy objectives
- Analytics vs. automation use cases
- Turning your objectives into labels
- Building your taxonomy structure
- Taxonomy design best practice
- Importing your taxonomy
- Overview of the model training process
- Generative Annotation (NEW)
- Understanding the status of your dataset
- Model training and annotating best practice
- Training with label sentiment analysis enabled
- Train
- Introduction to Refine
- Precision and recall explained
- Precision and recall
- How does Validation work?
- Understanding and improving model performance
- Why might a label have low average precision?
- Training using Check label and Missed label
- Training using Teach label (Refine)
- Training using Search (Refine)
- Understanding and increasing coverage
- Improving Balance and using Rebalance
- When to stop training your model
- Using general fields
- Generative extraction
- Using analytics and monitoring
- Automations and Communications Mining
- Licensing information
- FAQs and more
Datasets page
User permissions required: View Sources.
Datasets overview page
After signing in, you can see the Datasets overview page.
Alternatively, you can navigate to this page anytime by clicking the UiPath® Communications Mining™ logo in the top left of your page.
From this page you can:
- See all the Datasets you have access to.
- Edit or delete these Datasets. User permissions required: Datasets admin.
- Navigate to other pages in the platform.
You can navigate straight into a dataset by clicking one of the three options (Explore, Train, and Reports) listed underneath it.
When looking at the datasets you have access to, you can filter using the drop-down menu to a specific project that you are a member of, to restrict the number of datasets on display.
You can also search for a specific dataset by name using the search bar.
Selecting a dataset
Each dataset card gives you some useful information on the dataset:
Each dataset card references:
- The dataset title and description
- The project the dataset is link to and the dataset name (project/name)
- The sources connected to the dataset
- The model family (language)
- If sentiment analysis is enabled
- When the dataset was last changed (and when it was created on hover)
Select Explore, Train and Reports beneath the dataset information card, to navigate to those pages.
Copy an existing dataset
- Select the Duplicate option if you want to copy an existing dataset from another dataset (this will auto-select the same sources, and sentiment selection as that dataset)
-
Select all the (additional) sources that you want to connect to the dataset
What does copying a dataset mean and why would you do it?
When you create a new dataset, you can choose to essentially create a carbon copy of a pre-existing dataset. This means that you copy over the same sources, general fields, sentiment selection, labels and reviewed examples as the dataset you've copied the taxonomy from.
You can then work on the copy dataset (which will require a different name) and make changes to it freely without impacting the original.
There are two main reasons why you want to do this:
- You want to make major changes to your model, in terms of dataset structure for instance, and want to preserve the original dataset in case you want to revert back to it
- You want to use the work already done by annotating the original dataset and create a new dataset to which you can add additional sources of a similar nature.
Dataset settings page
As well as the Datasets overview page, each dataset has its own individual settings page. This can be accessed by clicking into the dataset and going to 'Settings'.
A dataset’s settings page contains useful information about the dataset and is where you can perform various actions.
The page is split into three tabs:
- Dataset - where you can update the global settings of the dataset, including title, description and sources.
- Taxonomy - where you can create, read, update and delete labels and their descriptions, extraction fields, general fields, and field types. You can also download the label taxonomy in full.
- Statistics - where you can see annotating statistics and the message metadata properties.