- Release Notes
- Getting started
- Notifications
- Projects
- Datasets
- Data Labeling
- ML packages
- Out of the box packages
- Pipelines
- ML Skills
- ML Logs
- Document UnderstandingTM in AI Center
- AI Center API
- Licensing
- AI Solutions Templates
- How to
- Basic Troubleshooting Guide
Multilabel Text Classification
Multilabel Text Classification is currently in public preview.
UiPath® is committed to stability and quality of our products, but preview features are always subject to change based on feedback that we receive from our customers. Using preview features is not recommended for production deployments.
Out of the Box Packages Multilabel text classification
This is a generic, retrainable model for tagging a text with multiple labels. This ML Package must be trained, and if deployed without training first, the deployment will fail with an error stating that the model is not trained. It is based on BERT, a self-supervised method for pretraining natural language processing systems. A GPU is recommended, especially during training. A GPU delivers ~5-10x improvement in speed.
This multilingual model supports the languages listed below. These languages were chosen because they are the top 100 languages with the largest Wikipedias:
- Afrikaans
- Albanian
- Arabic
- Aragonese
- Armenian
- Asturian
- Azerbaijani
- Bashkir
- Basque
- Bavarian
- Belarusian
- Bengali
- Bishnupriya Manipuri
- Bosnian
- Breton
- Bulgarian
- Burmese
- Catalan
- Cebuano
- Chechen
- Chinese (Simplified)
- Chinese (Traditional)
- Chuvash
- Croatian
- Czech
- Danish
- Dutch
- English
- Estonian
- Finnish
- French
- Galician
- Georgian
- German
- Greek
- Gujarati
- Haitian
- Hebrew
- Hindi
- Hungarian
- Icelandic
- Ido
- Indonesian
- Irish
- Italian
- Japanese
- Javanese
- Kannada
- Kazakh
- Kirghiz
- Korean
- Latin
- Latvian
- Lithuanian
- Lombard
- Low Saxon
- Luxembourgish
- Macedonian
- Malagasy
- Malay
- Malayalam
- Marathi
- Minangkabau
- Nepali
- Newar
- Norwegian (Bokmal)
- Norwegian (Nynorsk)
- Occitan
- Persian (Farsi)
- Piedmontese
- Polish
- Portuguese
- Punjabi
- Romanian
- Russian
- Scots
- Serbian
- Serbo-Croatian
- Sicilian
- Slovak
- Slovenian
- South Azerbaijani
- Spanish
- Sundanese
- Swahili
- Swedish
- Tagalog
- Tajik
- Tamil
- Tatar
- Telugu
- Turkish
- Ukrainian
- Urdu
- Uzbek
- Vietnamese
- Volapük
- Waray-Waray
- Welsh
- West Frisian
- Western Punjabi
- Yoruba
JSON with two lists. The first list will contain predicted label(s) and the second list will contain associated confidence on the label predicted (between 0-1).
Example:
{
"labels": [
"deliver",
"payment"
],
"confidence": [
0.780,
0.899
]
}
{
"labels": [
"deliver",
"payment"
],
"confidence": [
0.780,
0.899
]
}
This package supports all three types of pipelines (Full Training, Training, and Evaluation). For most use cases, no parameters need to be specified. The model uses advanced techniques to find a performant model. In the following trainings after the first one, the model uses incremental learning (that is, the previously trained version will be used, at the end of a training run).
text
and label
by default. The names of these two columns and/or properties are configurable using environment variables.
CSV file format
text
) and dataset.target_column_name (if not modified, the default value is labels
).
For example, a single CSV file can look like this:
text,labels
"I love this actor but I hate his movies", ['positive', 'negative']
text,labels
"I love this actor but I hate his movies", ['positive', 'negative']
You can use either GPU or CPU for training. We recommend using GPU since it's faster.
- dataset.text_column_name - default value
text
- model.epochs - default value
100
- dataset.target_column_name - default value
label
Confusion matrix
In order to better cover all labels, in the case of Multilabel Text Classification the confusion matrix is a JSON file. We provide a confusion matrix for each label ([[#True Positives, #True Negatives], [# False Positives, # False Negatives]])
{
"labels":[
"positive",
"negative"
],
"multilabel_confusion_matrix":[
[
[
83,
4
],
[
21,
4
]
],
[
[
105,
1
],
[
6,
0
]
]
]
}
{
"labels":[
"positive",
"negative"
],
"multilabel_confusion_matrix":[
[
[
83,
4
],
[
21,
4
]
],
[
[
105,
1
],
[
6,
0
]
]
]
}
Classification report
{
"positive": {
"precision": 0.89, "recall": 0.78, "f1-score": 0.84242424242424243, "support": 100
},
"negative": {
"precision": 0.9, "recall": 0.87, "f1-score": 0.86765432236398, "support": 89
}
}
{
"positive": {
"precision": 0.89, "recall": 0.78, "f1-score": 0.84242424242424243, "support": 100
},
"negative": {
"precision": 0.9, "recall": 0.87, "f1-score": 0.86765432236398, "support": 89
}
}
Evaluation
This is a CSV file with predictions on the test set used for evaluation.
label, text, predictions, confidence
{<code>positive</code>, <code>negative</code>}, "I love this actor but I hate his movies", [<code>positive</code>, <code>negative</code>], [0.9118645787239075, 0.971538782119751]
label, text, predictions, confidence
{<code>positive</code>, <code>negative</code>}, "I love this actor but I hate his movies", [<code>positive</code>, <code>negative</code>], [0.9118645787239075, 0.971538782119751]