- Release Notes
- Getting started
- Notifications
- Projects
- Datasets
- Data Labeling
- ML packages
- Out of the box packages
- Pipelines
- ML Skills
- ML Logs
- Document UnderstandingTM in AI Center
- AI Center API
- Licensing
- AI Solutions Templates
- How to
- Basic Troubleshooting Guide
French Text Classification
OS Packages > Language Analysis > FrenchTextClassification
This model is a generic text classification model using transfer learning for French language and needs to be trained before you can start using it for prediction. It is based on CamemBERT embeddings on which we add a 3 layers fully connected neural network to classify data. CamemBERT is a state-of-the-art language model for French based on the RoBERTa architecture pretrained on the French subcorpus of the multilingual corpus OSCAR by HuggingFace.
Text to be classified as String: "Mon séjour dans cet hôtel s’est très bien passé"
JSON string with pedicted class name, associated confidence on that class prediction (between 0-1) and a list of all classes with the associated confidence in “all_predictions” field.
Example:
{
"class": "Positive",
"confidence": 0.9996203184127808,
"all_predictions": [
{
"class": "Negative",
"confidence": 0.0003796691307798028
},
{
"class": "Positive",
"confidence": 0.9996203184127808
}
]
}
{
"class": "Positive",
"confidence": 0.9996203184127808,
"all_predictions": [
{
"class": "Negative",
"confidence": 0.0003796691307798028
},
{
"class": "Positive",
"confidence": 0.9996203184127808
}
]
}
All three types of pipelines (Full Training, Training and Evaluation) are supported by this package.
While you train the model for the first time, classes will be inferred by looking at the entire dataset provided. Once the model is trained, the same classes will be used for predictions and future retraining. If you want to reset the classes (or add new classes) you need to retrain the model using environment variable reset (see below).
Using a GPU will make pipeline execution much faster and is recommended for training on big dataset.
This ML Package will look for json and csv files into your dataset (not in subdirectories).
- csv files: it is expected csv with header named input_column(default “text”) and target_column(default “class”) and one line per data.
- json files: it is expected to contain only one data point with fields input_column(default “text”) and target_column(default “class”).
- epochs: customize number of epochs for training or Full Pipeline (default 10)
- input_column: change this value to match your dataset input column’s name (default “text”)
- target_column: change this value to match your dataset input column’s name (default “class”)
- reset: add this environment variable if you want to retrain from scratch the three layers neural network and/or change classes. By default, this model is using transfer learning and keep same classes than previous training.
Evaluate function produces two artifacts:
- predictions.csv: CSV file with 4 columns:
- text: input text being classified.
- class: ground truth class from dataset.
- predicted_class: class predicted by the model.
- confidence: confidence score associated with prediction.
- metrics.json: json file regrouping accuracy,
macro averaged f1, precision and recall along with f1, precision and recall for each class.
Example:
{ "accuracy": 0.7572500109672546, "f1_macro": 0.756912701179931, "precision_macro": 0.7594798901045778, "recall_macro": 0.7576722549210066, "details": [ { "class": "Negative", "f1": 0.7659677030609786, "precision": 0.7329335793357934, "recall": 0.8021201413427562 }, { "class": "Positive", "f1": 0.7478576992988835, "precision": 0.7860262008733624, "recall": 0.7132243684992571 } ] }
{ "accuracy": 0.7572500109672546, "f1_macro": 0.756912701179931, "precision_macro": 0.7594798901045778, "recall_macro": 0.7576722549210066, "details": [ { "class": "Negative", "f1": 0.7659677030609786, "precision": 0.7329335793357934, "recall": 0.8021201413427562 }, { "class": "Positive", "f1": 0.7478576992988835, "precision": 0.7860262008733624, "recall": 0.7132243684992571 } ] }
CamemBERT: a Tasty French Language Model by Louis Martin, Benjamin Muller, Pedro Javier Ortiz Suárez, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah and Benoît Sagot.