# Email transform tags

> When email data is uploaded to Communications Mining™, either through an [Exchange integration](https://docs.uipath.com/ixp/automation-cloud/latest/cm-user-guide/using-exchange-integrations) or through the [`sync-raw-emails` API endpoint](https://docs.uipath.com/ixp/automation-cloud/latest/cm-user-guide/api-comments), the platform receives the raw MIME content of each email. MIME is the standard format that email providers use to send emails. The platform converts this raw content into the message you see in a source.

When email data is uploaded to Communications Mining™, either through an [Exchange integration](https://docs.uipath.com/ixp/automation-cloud/latest/cm-user-guide/using-exchange-integrations) or through the [`sync-raw-emails` API endpoint](https://docs.uipath.com/ixp/automation-cloud/latest/cm-user-guide/api-comments), the platform receives the raw MIME content of each email. MIME is the standard format that email providers use to send emails. The platform converts this raw content into the message you see in a source.

A **transform tag** is a named configuration that controls how this conversion happens. For Exchange integrations, the transform tag is set on the source. For the `sync-raw-emails` and `predict-raw-emails` API endpoints, it is passed as a parameter in each request.

:::note
Transform tags only apply to raw email uploads. They have no effect on data uploaded as a [CSV file](https://docs.uipath.com/ixp/automation-cloud/latest/cm-user-guide/uploading-a-csv-file-into-a-source) or as pre-parsed comments, including uploads made using the Communications Mining™ activities.
:::

## What transform tags control

A transform tag controls, amongst other things:

* **Text format** - Whether messages are stored as plain text or with markup (rich text formatting).
* **Signature extraction** - Which model, if any, is used to detect and separate email signatures from the message body. Detected signatures are hidden from the model so that they do not add noise to predictions.
* **Default user properties** - Whether properties such as the mailbox name an email was synced from are set on each message automatically.

## Checking which transform tag a source uses

Use the [CLI](https://github.com/reinfer/cli) to list your sources, including the transform tag each one uses:

```bash
re get sources
```

The **Transform Tag** column shows the current tag for each source. A value of `missing` means the source has no transform tag set. Raw emails uploaded to such a source are processed with default settings.

## Available transform tags

If you do not specify a transform tag when a source is created, a sensible default is used. Tag names follow the format `<name>.<version>.<id>`, and you must always specify the full tag, for example `generic_simple_markup_set_id.0.3LPWBXWR`.

| TAG | FEATURES |
| --- | --- |
| `generic_simple_markup_set_id.0.3LPWBXWR` | Recommended for most sources. Uses markup. A machine learning model detects and removes signatures. Sets the mailbox name as a user property. |
| `generic_simp_mark_noop_setid.0.CHOJQ3XY` | As above, but with no signature extraction. The full email body, including signatures, is visible to the model. |

With both tags, only the newest message in an email chain is processed. The quoted history of previous messages is trimmed from the message body.

:::note
Older sources may use a transform tag that is not listed here, typically an older plain text tag. These continue to work, but prefer the markup tags listed above. They preserve the formatting of the original email, such as tables, which both the platform UI and generative extraction can take advantage of.
:::

## When to change the transform tag

For minimum disruption, check the transform tag your source currently uses and pick one with similar features, changing only the behavior you need. Common scenarios:

* **Turn off signature extraction** - Signature extraction occasionally hides content you want the model to read, for example reference numbers in the signature of a forwarded email. Switch from `generic_simple_markup_set_id.0.3LPWBXWR` to `generic_simp_mark_noop_setid.0.CHOJQ3XY`.

  :::warning
  Disabling signature extraction makes the entire signature visible to the model on every message, which introduces noise and can reduce model quality. Only disable it if the content being hidden has a high impact on your use case.
  :::
* **Enable markup** - If your source uses an older plain text tag and you see formatting issues, or want the model to read tables and other rich content, switch to `generic_simple_markup_set_id.0.3LPWBXWR`.
* **Route automations by mailbox** - Both tags listed above record the mailbox each email was synced from as a user property, which downstream automations can read, for example to route work items by region when several mailboxes share one source. Older tags may not set this property.

## Applying a transform tag

### Exchange integrations: set the tag on the source

Sources are created automatically when you add a mailbox through an Exchange integration. Use the [CLI](https://github.com/reinfer/cli) to change the transform tag of an existing source:

```bash
re update source <project>/<source-name> --transform-tag <tag>
```

For example:

```bash
re update source DefaultProject/Demo --transform-tag generic_simp_mark_noop_setid.0.CHOJQ3XY
```

The update takes effect immediately. Emails already synced into the source are reprocessed with the new settings, which can take an hour or more for large sources. Check [Warnings](#warnings) before changing the tag on a production source.

You can also set the tag when creating a source against an existing bucket:

```bash
re create source <project>/<source-name> --bucket <bucket> --transform-tag <tag>
```

If you specify an invalid tag, the platform rejects the request with a `422` error.

### API uploads: pass the tag in the request

The [`sync-raw-emails`](https://docs.uipath.com/ixp/automation-cloud/latest/cm-user-guide/api-comments) and [`predict-raw-emails`](https://docs.uipath.com/ixp/automation-cloud/latest/cm-user-guide/predictions) endpoints take a `transform_tag` parameter in the request body. Pass the same tag consistently for all uploads into a source, and use the same tag at prediction time that was used to upload the training data, so that the messages the model sees at runtime match the messages it was trained on.

## Testing a new transform tag

Before changing the transform tag on a production source, test the new tag on a copy of your data:

1. Create a new source referencing the same bucket, with the new transform tag:

   ```bash
   re create source <project>/<test-source-name> --bucket <bucket> --transform-tag <new-tag>
   ```
2. Create a new dataset, or duplicate your existing one, and add the new source to it.
3. Review how messages are parsed, and monitor Validation if you train on the new data.

Reprocessing existing data into a test source does not consume AI units.

## Warnings

* **Changing the transform tag on an existing source reprocesses its data.** Reprocessing takes an hour or more for large sources. During reprocessing, the dataset contains a mix of messages parsed with the old and new settings, which can temporarily depress the model score.
* **Changing how text is parsed can affect model performance.** The model was trained on messages parsed with the old settings. Significant changes, such as disabling signature extraction or switching between plain text and markup, change the text the model sees, and validation scores may shift. Monitor Validation after the change, and retrain as necessary.
* **Production automations can break.** If a stream on the affected dataset is consumed by a production automation, changing the message format, for example from plain text to markup, can break the downstream automation. Test the change on a separate source and dataset first, and update your automation before changing the production source.
