- API docs
- CLI
- Integration guides
- Blog
- How machines learn to understand words: a guide to embeddings in NLP
- Prompt-based learning with Transformers
- Efficient Transformers II: knowledge distillation & fine-tuning
- Efficient Transformers I: attention mechanisms
- Deep hierarchical unsupervised intent modelling: getting value without training data
- Fixing annotating bias with Communications Mining
- Active learning: better ML models in less time
- It's all in the numbers - assessing model performance with metrics
- Why model validation is important
- Comparing Communications Mining and Google AutoML for conversational data intelligence
Communications Mining Developer Guide
Comments
comments
, whilst the user guide and the Communications Mining UI refers primarily to messages
.
comments
to prepare it for upload, and how to understand data fetched from Communications Mining.
The Overview section describes the overall structure of a comment object. If you want to upload data to Communications Mining via the API, or to understand how to process data uploaded to Communications Mining via the API, check the Comments created via the API section. You can find detailed descriptions for each of the commonly used types of comments (emails or support tickets). If you want to better understand how to process data uploaded to Communications Mining via an integration, check the Comments created by integrations section. Finally, for a full list of available comment object fields, check the Reference section.
Communications Mining works with various types of text data such as emails, survey responses, support tickets, or customer reviews. What these types of data have in common is that they all consist of units of communication (an email, a survey response, a support ticket, a customer review). In Communications Mining, a single message is represented as a comment, for example.
No matter what kind of communication unit a comment symbolizes, it consistently maintains this fundamental structure:
{
"id": <UNIQUE ID>,
"timestamp": <TIMESTAMP>,
"messages": [
{
"body": { "text": <TEXT> },
...
}
],
"user_properties": { ... },
}
{
"id": <UNIQUE ID>,
"timestamp": <TIMESTAMP>,
"messages": [
{
"body": { "text": <TEXT> },
...
}
],
"user_properties": { ... },
}
As shown in the code snippet above, in addition to the actual piece of text, a comment always has an ID and a timestamp. The ID needs to be unique within the source containing the message. The timestamp is used in the platform UI to filter and sort by date, and to generate date-based analytics.
In addition to these required fields, other fields should be set depending on the type of the comment. If your data has been uploaded to Communications Mining via an integration, Communications Mining automatically populates all necessary fields. Check the following sections for a more detailed description.
Emails
sync-raw-emails
endpoint for raw emails, and the sync
endpoint for processed emails.
When syncing raw emails, provide the extracted MIME email headers and email body as-is (check the Reference for a description of the raw email format). Communications Mining parses the headers and cleans the email body.
How does Communications Mining process raw emails?
- Sets the email-specific fields in the message object
messages[0]
- Sets the
thread_id
field andthread_properties
object - Cleans up the email body by stripping quoted emails and putting the signature into a separate
signature
field - Populates the
user_properties
object with metadata extracted from email headers.
BCC:
field.
If you enrich emails with other data prior to uploading to Communications Mining, you can provide this additional data in the user properties of the comment.
{
"raw_email": {
"body": {
"plain": "Hi Bob,\n\nCould you send me the figures for today?\n\nThanks,\nAlice"
},
"headers": {
"raw": "From: Alice Smith <alice@example.com>\nDate: Tue, 3 Aug 2021 10:57:42 +0100\nMessage-ID: <e7784b5b@mail.example.com>\nSubject: Figures for today\nTo: Bob <bob@company.com>\nCc: Joe <joe@company.com>"
}
},
"user_properties": {
"string:Team": "Team XYZ"
}
}
{
"raw_email": {
"body": {
"plain": "Hi Bob,\n\nCould you send me the figures for today?\n\nThanks,\nAlice"
},
"headers": {
"raw": "From: Alice Smith <alice@example.com>\nDate: Tue, 3 Aug 2021 10:57:42 +0100\nMessage-ID: <e7784b5b@mail.example.com>\nSubject: Figures for today\nTo: Bob <bob@company.com>\nCc: Joe <joe@company.com>"
}
},
"user_properties": {
"string:Team": "Team XYZ"
}
}
{
"comment": {
"id": "3c6537373834623562406d61696c2e6578616d706c652e636f6d3e",
"timestamp": "2021-08-03T09:57:42Z",
"user_properties": {
"string:Has Signature": "Yes",
"string:Sender": "alice@example.com",
"string:Thread": "<e7784b5b@mail.example.com>",
"string:Message ID": "<e7784b5b@mail.example.com>",
"number:Recipient Count": 2,
"number:Participant Count": 3,
"number:Position in Thread": 1,
"string:Sender Domain": "example.com",
"string:Team": "Team XYZ"
},
"messages": [
{
"body": {
"text": "Hi Bob,\n\nCould you send me the figures for today?"
},
"signature": {
"text": "Thanks,\nAlice"
},
"subject": {
"text": "Figures for today"
},
"to": ["\"Bob\" <bob@company.com>"],
"cc": ["\"Joe\" <joe@company.com>"],
"sent_at": "2021-08-03T09:57:42Z",
"from": "\"Alice Smith\" <alice@example.com>"
}
],
"thread_id": "3c6537373834623562406d61696c2e6578616d706c652e636f6d3e"
},
"thread_properties": {
"duration": null,
"response_time": null,
"num_messages": 1,
"num_participants": 3,
"first_sender": "alice@example.com",
"thread_position": 0
}
}
{
"comment": {
"id": "3c6537373834623562406d61696c2e6578616d706c652e636f6d3e",
"timestamp": "2021-08-03T09:57:42Z",
"user_properties": {
"string:Has Signature": "Yes",
"string:Sender": "alice@example.com",
"string:Thread": "<e7784b5b@mail.example.com>",
"string:Message ID": "<e7784b5b@mail.example.com>",
"number:Recipient Count": 2,
"number:Participant Count": 3,
"number:Position in Thread": 1,
"string:Sender Domain": "example.com",
"string:Team": "Team XYZ"
},
"messages": [
{
"body": {
"text": "Hi Bob,\n\nCould you send me the figures for today?"
},
"signature": {
"text": "Thanks,\nAlice"
},
"subject": {
"text": "Figures for today"
},
"to": ["\"Bob\" <bob@company.com>"],
"cc": ["\"Joe\" <joe@company.com>"],
"sent_at": "2021-08-03T09:57:42Z",
"from": "\"Alice Smith\" <alice@example.com>"
}
],
"thread_id": "3c6537373834623562406d61696c2e6578616d706c652e636f6d3e"
},
"thread_properties": {
"duration": null,
"response_time": null,
"num_messages": 1,
"num_participants": 3,
"first_sender": "alice@example.com",
"thread_position": 0
}
}
Thread Properties
The following thread properties are available.
NAME | DESCRIPTION |
---|---|
thread_position | Position of comment in thread, calculated by ordering the comment by timestamp . Starts at 0 .
|
num_messages | Number of comments in thread. |
num_participants | Total number of unique participants (From, To, CC, BCC) in thread. |
first_sender | Sender of the first comment in thread. |
duration | Difference (in seconds) between the timestamps of first and last comment in thread. Will be set to null if num_messages is 1 (i.e. thread contains only 1 comment). Note: The
timestamp of a comment corresponds to the sent_at field of the corresponding raw email.
|
response_time | Difference (in seconds) between the first comment in thread and the first response in thread. The first response in thread
is the oldest comment where sender is not first_sender . Will be set to null if there are no responses in thread (i.e. if all emails in thread are from the same sender).
|
Each time a new comment is added to the platform, the thread properties of the corresponding thread are updated.
thread_position
, all properties are the same for each comment in thread.
In addition to the main text, a typical support ticket submitted via a form may have a subject, information about the sender (such as name or email address), and additional structured data (such as the topic of the ticket) which can be uploaded as part of the user properties of the comment.
{
"id": "dbcb03ad",
"timestamp": "2020-02-26T16:09:00Z",
"messages": [
{
"body": {
"text": "Hi Support Team\n\nPlease could you look into my broadband service network status. I don't have any signal."
},
"subject": {
"text": "Network Outage for over 24 hours - Customer account number 1234567"
},
"from": "alice.smith@example.com"
}
],
"user_properties": {
"string:Customer Name": "Alice Smith",
"string:Source": "Support Form",
"string:Topic": "Broadband"
}
}
{
"id": "dbcb03ad",
"timestamp": "2020-02-26T16:09:00Z",
"messages": [
{
"body": {
"text": "Hi Support Team\n\nPlease could you look into my broadband service network status. I don't have any signal."
},
"subject": {
"text": "Network Outage for over 24 hours - Customer account number 1234567"
},
"from": "alice.smith@example.com"
}
],
"user_properties": {
"string:Customer Name": "Alice Smith",
"string:Source": "Support Form",
"string:Topic": "Broadband"
}
}
Emails (Microsoft Exchange)
Microsoft Exchange emails ingested into Communications Mining via the Exchange integration are automatically converted into comment objects in the same way as raw emails.
attachments
field contains metadata about them:```json
{
"id": "3c484531505230324d423",
"attachments": [
{
"name": "account-statement.pdf",
"size": 49078,
"content_type": "application/pdf",
}
],
// other comment fields omitted
...
},
```
```json
{
"id": "3c484531505230324d423",
"attachments": [
{
"name": "account-statement.pdf",
"size": 49078,
"content_type": "application/pdf",
}
],
// other comment fields omitted
...
},
```
attachment_reference
field:
```json
{
"id": "3c484531505230324d423",
"attachments": [
{
"name": "account-statement.pdf",
"size": 49078,
"content_type": "application/pdf",
"attachment_reference": "CjQSEIExTHEqtdntoxz2WtbZDNEiIIVqcP1Sfx2L4epyRQDasa1RSODvheQ3bvLhj3L-_81G"
}
],
// other comment fields omitted
...
},
```
```json
{
"id": "3c484531505230324d423",
"attachments": [
{
"name": "account-statement.pdf",
"size": 49078,
"content_type": "application/pdf",
"attachment_reference": "CjQSEIExTHEqtdntoxz2WtbZDNEiIIVqcP1Sfx2L4epyRQDasa1RSODvheQ3bvLhj3L-_81G"
}
],
// other comment fields omitted
...
},
```
attachment_reference
to retrieve the binary file content from [the attachments API](#FIXME). For the example above, you fetch the following URL:
https://cloud.uipath.com/<organisation>/<tenant>/reinfer_/api/v1/attachments/CjQSEIExTHEqtdntoxz2WtbZDNEiIIVqcP1Sfx2L4epyRQDasa1RSODvheQ3bvLhj3L-_81G.
Check the [API Reference](#FIXME) for further details about this type of request.
attachment_reference
property, you can't download the attachment's content. This may be because:
- Communications Mining didn't receive the attachment's content.
- The attachment content exceeded the size limit for uploading to Communications Mining.
- Communications Mining processed the attachment before it supported file contents.
Learn more about the Attachment contents on the Attachment page.
Comments
See the table below for a list of available comment fields. If you are unfamiliar with Communications Mining comment objects, check the Overview.
NAME | TYPE | REQUIRED | DESCRIPTION |
---|---|---|---|
id | string | yes | Identifies a comment uniquely within a source. Any hexadecimal string of up to 1024 characters is valid (conforms to /[0-9a-f]{1,1024}/). |
timestamp | string | yes | A ISO-8601 timestamp indicating when the comment was created. If the timestamp does not specify a timezone, UTC will be assumed. The timestamp must be in the range 1950-01-01T00:00:00Z to 2049-12-31T23:59:59Z inclusive. |
messages | array<Message> | yes | An array of zero or one message. |
user_properties | map<string, string | number> | no | Any user-defined metadata that applies to the comment. There are two possible types: string and number . The key of a user property has the format "type:name", eg. "string:Domain Name" or "number:Star Rating". The user property
name may consist of letters, numbers, spaces, and underscores, and may contain up to 32 characters (conforms to /\w([\w ]{0,30}\w)?/).
The value must be a string or a number depending on the type of the user property.
|
thread_id | string | no | An ID uniquely identifying an email thread. Any hexadecimal string of up to 1024 characters is valid (conforms to /[0-9a-f]{1,1024}/). |
uid | string | set by Communications Mining | A combined source and comment ID in the form of source_id.comment_id . You should not be setting this field directly as it's automatically generated by Communications Mining for uploaded comments.
|
created_at | string | set by Communications Mining | A ISO-8601 timestamp with the same constraints as the timestamp field. You should not be setting this field directly as it's automatically generated by Communications Mining when the comment
is created.
|
updated_at | string | set by Communications Mining | A ISO-8601 timestamp with the same constraints as the timestamp field. You should not be setting this field directly as it's automatically generated by Communications Mining when the comment
is updated.
|
attachments | array<Attachment> | no | An array of zero or more attachments. An attachment represents a file attached to a comment. |
NAME | TYPE | REQUIRED | DESCRIPTION |
---|---|---|---|
name | string | yes | The attachment's file name. |
size | number | yes | The size of the attachment's file content in bytes. |
content_type | string | yes | The [Media type](https://en.wikipedia.org/wiki/Media_type) of the attachment. For a list of possible values, see the [IANA Media Types](https://www.iana.org/assignments/media-types/media-types.xhtml) list. |
attachment_reference | string | no | Used to retrieve the binary file content from [the attachments API](#FIXME) |
Message
has the following format:
NAME | TYPE | REQUIRED | DESCRIPTION |
---|---|---|---|
body | Content | yes | An object containing the main body text of the message. |
subject | Content | no | An object containing the message's subject. |
signature | Content | no | An object containing the message's signature. |
from | string | no | The message sender. |
to | array<string> | no | An array of primary recipients. |
cc | array<string> | no | An array of carbon-copy recipients. |
bcc | array<string> | no | An array of blind carbon-copy recipients. |
sent_at | string | no | A ISO-8601 timestamp indicating when the message was created. If the timestamp does not specify a timezone, UTC will be assumed. |
language | string | no | The original language of the message. If this is supplied, both text and translated_from should be supplied for the Content fields.
|
Content
has the following format:
NAME | TYPE | REQUIRED | DESCRIPTION |
---|---|---|---|
text | string | yes | If language (other than the source's language ) has been supplied, this should be the translated text of the content. Otherwise, it should be in the original language it
was collected; it will be translated if not in the source's language and the source has should_translate set to true . Maximim 65536 characters.
|
translated_from | string | no | If language (other than the source's language ) has been supplied, this should by the original text of the content. Supplying this field without having supplied a language will result in an error. At most 65536 characters.
|
Raw Emails
See the table below for a list of available raw email fields.
NAME | TYPE | REQUIRED | DESCRIPTION |
---|---|---|---|
headers | Headers | yes | An object containing the headers of the email. |
body | Body | yes | An object containing the main body of the email. |
Headers
has the following format:
NAME | TYPE | REQUIRED | DESCRIPTION |
---|---|---|---|
raw | string | no | One of raw and parsed is required. The raw email headers, given as a single string, with each header on its own line.
|
parsed | map<string, string | array<string>> | no |
One of
raw and parsed is required. The parsed email headers, given as an object with string keys and string or array<string> values.
Each key must be ASCII, and represents one email header. Value strings may be any valid UTF-8. Lists of values will be concatenated with
, before being set as a single header value. If you require duplicate header keys, please use raw instead.
|
Body
has the following format:
NAME | TYPE | REQUIRED | DESCRIPTION |
---|---|---|---|
plain | string | no | At least one of plain and html is required. The plaintext content of the email. At most 65536 characters.
|
html | string | no | At least one of plain and html is required. The HTML content of the email.
|