communications-mining
latest
false
- API docs
- CLI
- Integration guides
- Blog
- How machines learn to understand words: a guide to embeddings in NLP
- Prompt-based learning with Transformers
- Efficient Transformers II: knowledge distillation & fine-tuning
- Efficient Transformers I: attention mechanisms
- Deep hierarchical unsupervised intent modelling: getting value without training data
- Fixing annotating bias with Communications Mining
- Active learning: better ML models in less time
- It's all in the numbers - assessing model performance with metrics
- Why model validation is important
- Comparing Communications Mining and Google AutoML for conversational data intelligence
Batch delete
Communications Mining Developer Guide
Last updated Oct 3, 2024
Batch delete
The CLI allows you to delete comments based on a time period, for example all comments older than two years. This is useful
for cleaning up historical data. Note that the time period is based on the comment's
timestamp
field, rather than the datetime the comment was uploaded to Communications Mining.
Before deleting or modifying your comments, you may optionally want to back up annotated comments, so as not to accidentally lose the manual work of the model trainers:
re get comments \
<project_name/source_name> \
--dataset <project_name/dataset_name> \
--reviewed-only true \
--file <output_file_name.jsonl>
re get comments \
<project_name/source_name> \
--dataset <project_name/dataset_name> \
--reviewed-only true \
--file <output_file_name.jsonl>
If the source was added to multiple datasets, you should run the above command for each of those datasets.
Warning:
DELETING ANNOTATIONS CHANGES MODEL PERFORMANCE
If the comments you are deleting were added to one or more datasets where they could have been annotated, deleting annotated comments will result in a change of model performance in those datasets going forward (pinned models will be unaffected). You can optionally tell the CLI to skip annotated comments.
The command below will delete all comments in a source between
FROM_TIMESTAMP
and TO_TIMESTAMP
excluding annotated comments. The timestamp should be in RFC 3339 format, e.g. 1970-01-02T03:04:05Z
.
re delete bulk \
--source <project_name/source_name> \
--include-annotated=false \
--from-timestamp FROM_TIMESTAMP \
--to-timestamp TO_TIMESTAMP
re delete bulk \
--source <project_name/source_name> \
--include-annotated=false \
--from-timestamp FROM_TIMESTAMP \
--to-timestamp TO_TIMESTAMP
If you are sure you want to delete annotated comments, you can set
--include-annotated=true
.