communications-mining
latest
false
UiPath logo, featuring letters U and I in white
Communications Mining User Guide
Last updated Nov 7, 2024

Model training FAQs

The information on this page is split into two sections:
  • General model training
  • Label training

General model training

What is the objective of training a model?

The objective of training a model is to create a set of training data that is as representative as possible of the dataset as a whole, so that the platform can accurately and confidently predict the relevant labels and general fields for each message. The labels and general fields within a dataset should be intrinsically linked to the overall objectives of the use case and provide significant business value.

Why can I not see anything in Discover if I've just uploaded data into the platform?

As soon as data is uploaded to the platform, the platform begins a process called unsupervised learning, whereby it groups messages into clusters of similar semantic intent. This process can take up to a couple of hours, depending on the size of the dataset, and clusters will appear once it is complete.

How much historical data do I need to train a model?

To be able to train a model, you need a minimum amount of existing historical data. This is used as training data to provide the platform with the necessary information to confidently predict each of the relevant concepts for your analysis and/or automation.

The recommendation for any use case is a minimum of 12 months of historical data, in order to properly capture any seasonality or irregularity in the data (e.g. month-end processes and busy seasons).

Do I need to save my model every time I make a change?

No, you do not need to save your model after any changes are made. Every time you train the platform on your data (i.e. annotating any messages), a new model version is created for your dataset. Performance statistics for older model versions can be viewed in Validation.

How do I know what the performance of the model is?

Please check the Validation page in the platform, which reports various performance measures and provides a holistic model health rating. This page updates after every training event and it can be used to identify areas where the model may need more training examples or some label corrections in order to ensure consistency.

Please see the Validation page, for full explanations of model performance and how to improve it.

Why are there only 30 clusters available and can we set them individually?

The clusters are a helpful way to help you quickly build up your taxonomy, but users will spend most of their time training in Explore rather than Discover.

If users spend too much time annotating via clusters, there’s a risk of overfitting the model to look for messages that only fit these clusters when making predictions. The more varied examples there are for each label, the better the model will be at finding the different ways of expressing the same intent or concept. This is one of the main reasons why we only show 30 clusters at a time.

Once enough training has been completed or a significant volume of data has been added to the platform (see here), however, Discover does retrain. When it retrains, it takes into account the existing training to-date, and will try to present new clusters that are not well covered by the current taxonomy.

For more information on Discover, see here.

How many messages are in each cluster?

There are 30 clusters in total, each containing 12 messages. In the platform, you are able to filter the number of messages shown on the page in increments between 6 and 12 per page. Our recommendation is annotating 6 at a time to ensure that you reduce the risk of partially annotating any messages.

What do precision and recall mean?

Precision and recall are metrics used to measure the performance of a machine learning model. A detailed description of each can be found under the Using Validation section of our how-to guides.

Can I return to an earlier version of my model?

You can access the validation overview of earlier models by hovering over ‘Model Version’ in the top left corner of the Validation page. This can be helpful for tracking and comparing progress as you train out your model.

If you need to roll your model back to a previous pinned version, please see here for more details.

Label training

Can I change the name of a label later on?

Yes, it’s really easy to do. You can go into the settings for each label and rename it at any point. You can see how to do it here.

How do I find out the number of messages I have annotated?

Information about your dataset, including how many message that have been annotated, is displayed in the Datasets Settings page. To see how to access it, click here.

One of my labels is performing poorly, what can I do to improve it?

If you can see in the Validation page that your label is performing poorly, there are various ways to improve its performance. See here to understand more.

What does the red dial next to my label or general field indicate? How do I get rid of it?

The little red dials next to each label/general field indicate whether more examples are needed for the platform to accurately estimate the label/general field's performance. The dials start to disappear as you provide more training examples and will disappear completely once you reach 25 examples.

After this, the platform will be able to effectively evaluate the performance of a given label/general field and may return a performance warning if the label/general field is not healthy.

Should I avoid annotating empty/uninformative messages?

The platform is able to learn from empty messages and uninformative messages as long as they are annotated correctly. However, it is worth noting that uninformative labels will likely need a significant number of training examples, as well as to be loosely grouped by concept, to ensure best performance.

Was this page helpful?

Get The Help You Need
Learning RPA - Automation Courses
UiPath Community Forum
Uipath Logo White
Trust and Security
© 2005-2024 UiPath. All rights reserved.