- Introduction
- Setting up your account
- Balance
- Clusters
- Concept drift
- Coverage
- Datasets
- General fields
- Labels (predictions, confidence levels, label hierarchy, and label sentiment)
- Models
- Streams
- Model Rating
- Projects
- Precision
- Recall
- Annotated and unannotated messages
- Extraction Fields
- Sources
- Taxonomies
- Training
- True and false positive and negative predictions
- Validation
- Messages
- Access control and administration
- Manage sources and datasets
- Understanding the data structure and permissions
- Creating or deleting a data source in the GUI
- Preparing data for .CSV upload
- Uploading a CSV file into a source
- Creating a dataset
- Multilingual sources and datasets
- Enabling sentiment on a dataset
- Amending dataset settings
- Deleting a message
- Deleting a dataset
- Exporting a dataset
- Using Exchange integrations
- Email transform tags
- Model training and maintenance
- Understanding labels, general fields, and metadata
- Label hierarchy and best practices
- Comparing analytics and automation use cases
- Turning your objectives into labels
- Overview of the model training process
- Generative Annotation
- Dastaset status
- Model training and annotating best practice
- Training with label sentiment analysis enabled
- Understanding data requirements
- Train
- Introduction to Refine
- Precision and recall explained
- Precision and Recall
- How validation works
- Understanding and improving model performance
- Reasons for label low average precision
- Training using Check label and Missed label
- Training using Teach label (Refine)
- Training using Search (Refine)
- Understanding and increasing coverage
- Improving Balance and using Rebalance
- When to stop training your model
- Using general fields
- Generative extraction
- Using analytics and monitoring
- Automations and Communications Mining™
- Developer
- Uploading data
- Downloading data
- Exchange Integration with Azure service user
- Exchange Integration with Azure Application Authentication
- Exchange Integration with Azure Application Authentication and Graph
- Migration Guide: Exchange Web Services (EWS) to Microsoft Graph API
- Fetching data for Tableau with Python
- Elasticsearch integration
- General field extraction
- Self-hosted Exchange integration
- UiPath® Automation Framework
- UiPath® official activities
- How machines learn to understand words: a guide to embeddings in NLP
- Prompt-based learning with Transformers
- Efficient Transformers II: knowledge distillation & fine-tuning
- Efficient Transformers I: attention mechanisms
- Deep hierarchical unsupervised intent modelling: getting value without training data
- Fixing annotating bias with Communications Mining™
- Active learning: better ML models in less time
- It's all in the numbers - assessing model performance with metrics
- Why model validation is important
- Comparing Communications Mining™ and Google AutoML for conversational data intelligence
- Licensing
- FAQs and more
Communications Mining user guide
When email data is uploaded to Communications Mining™, either through an Exchange integration or through the sync-raw-emails API endpoint, the platform receives the raw MIME content of each email. MIME is the standard format that email providers use to send emails. The platform converts this raw content into the message you see in a source.
A transform tag is a named configuration that controls how this conversion happens. For Exchange integrations, the transform tag is set on the source. For the sync-raw-emails and predict-raw-emails API endpoints, it is passed as a parameter in each request.
Transform tags only apply to raw email uploads. They have no effect on data uploaded as a CSV file or as pre-parsed comments, including uploads made using the Communications Mining™ activities.
What transform tags control
A transform tag controls, amongst other things:
- Text format - Whether messages are stored as plain text or with markup (rich text formatting).
- Signature extraction - Which model, if any, is used to detect and separate email signatures from the message body. Detected signatures are hidden from the model so that they do not add noise to predictions.
- Default user properties - Whether properties such as the mailbox name an email was synced from are set on each message automatically.
Checking which transform tag a source uses
Use the CLI to list your sources, including the transform tag each one uses:
re get sources
re get sources
The Transform Tag column shows the current tag for each source. A value of missing means the source has no transform tag set. Raw emails uploaded to such a source are processed with default settings.
Available transform tags
If you do not specify a transform tag when a source is created, a sensible default is used. Tag names follow the format <name>.<version>.<id>, and you must always specify the full tag, for example generic_simple_markup_set_id.0.3LPWBXWR.
| TAG | FEATURES |
|---|---|
generic_simple_markup_set_id.0.3LPWBXWR | Recommended for most sources. Uses markup. A machine learning model detects and removes signatures. Sets the mailbox name as a user property. |
generic_simp_mark_noop_setid.0.CHOJQ3XY | As above, but with no signature extraction. The full email body, including signatures, is visible to the model. |
With both tags, only the newest message in an email chain is processed. The quoted history of previous messages is trimmed from the message body.
Older sources may use a transform tag that is not listed here, typically an older plain text tag. These continue to work, but prefer the markup tags listed above. They preserve the formatting of the original email, such as tables, which both the platform UI and generative extraction can take advantage of.
When to change the transform tag
For minimum disruption, check the transform tag your source currently uses and pick one with similar features, changing only the behavior you need. Common scenarios:
-
Turn off signature extraction - Signature extraction occasionally hides content you want the model to read, for example reference numbers in the signature of a forwarded email. Switch from
generic_simple_markup_set_id.0.3LPWBXWRtogeneric_simp_mark_noop_setid.0.CHOJQ3XY.Warning:Disabling signature extraction makes the entire signature visible to the model on every message, which introduces noise and can reduce model quality. Only disable it if the content being hidden has a high impact on your use case.
-
Enable markup - If your source uses an older plain text tag and you see formatting issues, or want the model to read tables and other rich content, switch to
generic_simple_markup_set_id.0.3LPWBXWR. -
Route automations by mailbox - Both tags listed above record the mailbox each email was synced from as a user property, which downstream automations can read, for example to route work items by region when several mailboxes share one source. Older tags may not set this property.
Applying a transform tag
Exchange integrations: set the tag on the source
Sources are created automatically when you add a mailbox through an Exchange integration. Use the CLI to change the transform tag of an existing source:
re update source <project>/<source-name> --transform-tag <tag>
re update source <project>/<source-name> --transform-tag <tag>
For example:
re update source DefaultProject/Demo --transform-tag generic_simp_mark_noop_setid.0.CHOJQ3XY
re update source DefaultProject/Demo --transform-tag generic_simp_mark_noop_setid.0.CHOJQ3XY
The update takes effect immediately. Emails already synced into the source are reprocessed with the new settings, which can take an hour or more for large sources. Check Warnings before changing the tag on a production source.
You can also set the tag when creating a source against an existing bucket:
re create source <project>/<source-name> --bucket <bucket> --transform-tag <tag>
re create source <project>/<source-name> --bucket <bucket> --transform-tag <tag>
If you specify an invalid tag, the platform rejects the request with a 422 error.
API uploads: pass the tag in the request
The sync-raw-emails and predict-raw-emails endpoints take a transform_tag parameter in the request body. Pass the same tag consistently for all uploads into a source, and use the same tag at prediction time that was used to upload the training data, so that the messages the model sees at runtime match the messages it was trained on.
Testing a new transform tag
Before changing the transform tag on a production source, test the new tag on a copy of your data:
-
Create a new source referencing the same bucket, with the new transform tag:
re create source <project>/<test-source-name> --bucket <bucket> --transform-tag <new-tag>re create source <project>/<test-source-name> --bucket <bucket> --transform-tag <new-tag> -
Create a new dataset, or duplicate your existing one, and add the new source to it.
-
Review how messages are parsed, and monitor Validation if you train on the new data.
Reprocessing existing data into a test source does not consume AI units.
Warnings
- Changing the transform tag on an existing source reprocesses its data. Reprocessing takes an hour or more for large sources. During reprocessing, the dataset contains a mix of messages parsed with the old and new settings, which can temporarily depress the model score.
- Changing how text is parsed can affect model performance. The model was trained on messages parsed with the old settings. Significant changes, such as disabling signature extraction or switching between plain text and markup, change the text the model sees, and validation scores may shift. Monitor Validation after the change, and retrain as necessary.
- Production automations can break. If a stream on the affected dataset is consumed by a production automation, changing the message format, for example from plain text to markup, can break the downstream automation. Test the change on a separate source and dataset first, and update your automation before changing the production source.