IXP - Email transform tags

ixp

latest

false

Communications Mining user guide

Email transform tags

When email data is uploaded to Communications Mining™, either through an Exchange integration or through the sync-raw-emails API endpoint, the platform receives the raw MIME content of each email. MIME is the standard format that email providers use to send emails. The platform converts this raw content into the message you see in a source.

A transform tag is a named configuration that controls how this conversion happens. For Exchange integrations, the transform tag is set on the source. For the sync-raw-emails and predict-raw-emails API endpoints, it is passed as a parameter in each request.

Note:

Transform tags only apply to raw email uploads. They have no effect on data uploaded as a CSV file or as pre-parsed comments, including uploads made using the Communications Mining™ activities.

What transform tags control

A transform tag controls, amongst other things:

Text format - Whether messages are stored as plain text or with markup (rich text formatting).
Signature extraction - Which model, if any, is used to detect and separate email signatures from the message body. Detected signatures are hidden from the model so that they do not add noise to predictions.
Default user properties - Whether properties such as the mailbox name an email was synced from are set on each message automatically.

Checking which transform tag a source uses

Use the CLI to list your sources, including the transform tag each one uses:

re get sources
re get sources

The Transform Tag column shows the current tag for each source. A value of missing means the source has no transform tag set. Raw emails uploaded to such a source are processed with default settings.

Available transform tags

If you do not specify a transform tag when a source is created, a sensible default is used. Tag names follow the format <name>.<version>.<id>, and you must always specify the full tag, for example generic_simple_markup_set_id.0.3LPWBXWR.

TAG	FEATURES
`generic_simple_markup_set_id.0.3LPWBXWR`	Recommended for most sources. Uses markup. A machine learning model detects and removes signatures. Sets the mailbox name as a user property.
`generic_simp_mark_noop_setid.0.CHOJQ3XY`	As above, but with no signature extraction. The full email body, including signatures, is visible to the model.

With both tags, only the newest message in an email chain is processed. The quoted history of previous messages is trimmed from the message body.

Note:

Older sources may use a transform tag that is not listed here, typically an older plain text tag. These continue to work, but prefer the markup tags listed above. They preserve the formatting of the original email, such as tables, which both the platform UI and generative extraction can take advantage of.

When to change the transform tag

For minimum disruption, check the transform tag your source currently uses and pick one with similar features, changing only the behavior you need. Common scenarios:

Turn off signature extraction - Signature extraction occasionally hides content you want the model to read, for example reference numbers in the signature of a forwarded email. Switch from generic_simple_markup_set_id.0.3LPWBXWR to generic_simp_mark_noop_setid.0.CHOJQ3XY.

Warning:
Disabling signature extraction makes the entire signature visible to the model on every message, which introduces noise and can reduce model quality. Only disable it if the content being hidden has a high impact on your use case.
Enable markup - If your source uses an older plain text tag and you see formatting issues, or want the model to read tables and other rich content, switch to generic_simple_markup_set_id.0.3LPWBXWR.
Route automations by mailbox - Both tags listed above record the mailbox each email was synced from as a user property, which downstream automations can read, for example to route work items by region when several mailboxes share one source. Older tags may not set this property.

Applying a transform tag

Exchange integrations: set the tag on the source

Sources are created automatically when you add a mailbox through an Exchange integration. Use the CLI to change the transform tag of an existing source:

re update source <project>/<source-name> --transform-tag <tag>
re update source <project>/<source-name> --transform-tag <tag>

For example:

re update source DefaultProject/Demo --transform-tag generic_simp_mark_noop_setid.0.CHOJQ3XY
re update source DefaultProject/Demo --transform-tag generic_simp_mark_noop_setid.0.CHOJQ3XY

The update takes effect immediately. Emails already synced into the source are reprocessed with the new settings, which can take an hour or more for large sources. Check Warnings before changing the tag on a production source.

You can also set the tag when creating a source against an existing bucket:

re create source <project>/<source-name> --bucket <bucket> --transform-tag <tag>
re create source <project>/<source-name> --bucket <bucket> --transform-tag <tag>

If you specify an invalid tag, the platform rejects the request with a 422 error.

API uploads: pass the tag in the request

The sync-raw-emails and predict-raw-emails endpoints take a transform_tag parameter in the request body. Pass the same tag consistently for all uploads into a source, and use the same tag at prediction time that was used to upload the training data, so that the messages the model sees at runtime match the messages it was trained on.

Testing a new transform tag

Before changing the transform tag on a production source, test the new tag on a copy of your data:

Create a new source referencing the same bucket, with the new transform tag:

re create source <project>/<test-source-name> --bucket <bucket> --transform-tag <new-tag>
re create source <project>/<test-source-name> --bucket <bucket> --transform-tag <new-tag>

Create a new dataset, or duplicate your existing one, and add the new source to it.
Review how messages are parsed, and monitor Validation if you train on the new data.

Reprocessing existing data into a test source does not consume AI units.

Warnings

Changing the transform tag on an existing source reprocesses its data. Reprocessing takes an hour or more for large sources. During reprocessing, the dataset contains a mix of messages parsed with the old and new settings, which can temporarily depress the model score.
Changing how text is parsed can affect model performance. The model was trained on messages parsed with the old settings. Significant changes, such as disabling signature extraction or switching between plain text and markup, change the text the model sees, and validation scores may shift. Monitor Validation after the change, and retrain as necessary.
Production automations can break. If a stream on the affected dataset is consumed by a production automation, changing the message format, for example from plain text to markup, can break the downstream automation. Test the change on a separate source and dataset first, and update your automation before changing the production source.

On this page

What transform tags control
Checking which transform tag a source uses
Available transform tags
When to change the transform tag
Applying a transform tag
Exchange integrations: set the tag on the source
API uploads: pass the tag in the request
Testing a new transform tag
Warnings

Was this page helpful?

PREVIOUSUsing Exchange integrations

NEXTUnderstanding model training

What transform tags control​

Checking which transform tag a source uses​

Available transform tags​

When to change the transform tag​

Applying a transform tag​

Exchange integrations: set the tag on the source​

API uploads: pass the tag in the request​

Testing a new transform tag​

Warnings​