- API docs
- CLI
- Integration guides
- Blog
- How machines learn to understand words: a guide to embeddings in NLP
- Prompt-based learning with Transformers
- Efficient Transformers II: knowledge distillation & fine-tuning
- Efficient Transformers I: attention mechanisms
- Deep hierarchical unsupervised intent modelling: getting value without training data
- Fixing annotating bias with Communications Mining
- Active learning: better ML models in less time
- It's all in the numbers - assessing model performance with metrics
- Why model validation is important
- Comparing Communications Mining and Google AutoML for conversational data intelligence
Batch upload
BILLABLE OPERATION
You will be charged 1 AI unit per created comment, or per updated comment (based on its unique ID) if its text was modified.
The CLI allows you to upload comments (including pre-annotated comments) in batch. In addition to importing data into Communications Mining in those cases where a live connection is not required, it can be used to upload pre-existing training data into Communications Mining, or to overwrite existing comments or labels in Communications Mining.
The CLI expects data in JSONL format (also called newline-delimited JSON), where each line is a JSON value. Many tools will be able to export JSONL files out-of-the-box. Please contact support if you have any questions.
Each line in the JSONL file represents a comment object. Each comment object should have at least a unique ID, a timestamp, and a piece of text, but can have other fields such as metadata. Please see the Comment reference to learn which fields to set for your data.
Each line in the JSONL file should have the following format (only required fields shown). (Note that this is shown with indentation for readability, but should be all on one line in your file.)
{
"comment": {
"id": "<unique id>",
"timestamp": "<timestamp>",
"messages": [
{
"body": {
"text": "<text of the comment>"
}
}
]
}
}
{
"comment": {
"id": "<unique id>",
"timestamp": "<timestamp>",
"messages": [
{
"body": {
"text": "<text of the comment>"
}
}
]
}
}
If you would like to upload labels alongside comments, you can include them like so (same as above, this is shown with indentation for readability, but should be all on one line in your file):
{
"comment": {
"id": "<unique id>",
"timestamp": "<timestamp>",
"messages": [
{
"body": {
"text": "<text of the comment>"
}
}
]
},
"annotating": {
"assigned": [
{
"name": "<Your Label Name>",
"sentiment": "<positive|negative>"
},
{
"name": "<Another Label Name>",
"sentiment": "<positive|negative>"
}
]
}
}
{
"comment": {
"id": "<unique id>",
"timestamp": "<timestamp>",
"messages": [
{
"body": {
"text": "<text of the comment>"
}
}
]
},
"annotating": {
"assigned": [
{
"name": "<Your Label Name>",
"sentiment": "<positive|negative>"
},
{
"name": "<Another Label Name>",
"sentiment": "<positive|negative>"
}
]
}
}
Uploading Comments
The command below will upload comments to the specified source. We recommend to upload comments into a new empty source, as it makes rolling back easier if something went wrong - you just delete the source.
re create comments \
--source <project_name/source_name> \
--file <file_name.jsonl>
re create comments \
--source <project_name/source_name> \
--file <file_name.jsonl>
--overwrite
flag. The comments will be overwritten based on the comment.id
field. We recommend that you make a backup copy of the source before updating comments in order to be able to recover the
original comments if something goes wrong.
Uploading Comments with Labels
If you would like to upload labels together with your comments, you should specify a dataset into which the labels should be uploaded. The dataset should be connected to the source before you start uploading.
re create comments \
--source <project_name/source_name> \
--dataset <project_name/dataset_name> \
--file <file_name.jsonl>
re create comments \
--source <project_name/source_name> \
--dataset <project_name/dataset_name> \
--file <file_name.jsonl>
--overwrite
flag. Note that this will replace existing labels with new labels (not add existing labels to new labels). We recommend that
you make a backup copy of the dataset before overwriting labels in order to be able to recover the original labels if something
goes wrong.