- API docs
- CLI
- Integration guides
- Blog
- How machines learn to understand words: a guide to embeddings in NLP
- Prompt-based learning with Transformers
- Efficient Transformers II: knowledge distillation & fine-tuning
- Efficient Transformers I: attention mechanisms
- Deep hierarchical unsupervised intent modelling: getting value without training data
- Fixing annotating bias with Communications Mining
- Active learning: better ML models in less time
- It's all in the numbers - assessing model performance with metrics
- Why model validation is important
- Comparing Communications Mining and Google AutoML for conversational data intelligence
Real-time automation
In this hands-on tutorial we will be building a simple automated triage application that uses Communications Mining to categorize incoming emails in real time. We'll build an end-to-end workflow that can be used as a starting point for your own Communications Mining automation, and take a detailed look at how to use the real-time Stream API.
Before starting this tutorial, please make sure you are familiar with Communications Mining concepts and terminology and Communications Mining API basics.
You need the following permissions in order to follow the tutorial. You can check your current permissions on your Manage Account page.
PROJECT | DESCRIPTION | PERMISSIONS |
---|---|---|
reinfer-sandbox | Contains the pre-annotated reinfer-sandbox/integration-tutorial dataset used in this tutorial.
| "View sources", "View labels" |
Your development project | During your onboarding, you should have received access to a project that you can use as your development environment. | "Stream admin", "Consume streams", "View sources", "View labels" |
reinfer-sandbox
.
reinfer-sandbox/integration-tutorial
dataset, create a new dataset in your development project using the "Copy an existing taxonomy" option. You can find instructions
on how to do that here.
Since your new dataset contains annotated data, the model will start training immediately. You can track the model training status in the dataset status bar. Once done, performance metrics for each label will appear on the Validation page, and a new model version will appear on the Models page.
Now that you are familiar with the prerequisites, let's start building our end-to-end workflow. In this section we will discuss the design of a simple automated triage application and its integration with Communications Mining. In the next sections we will learn about the Stream API that will drive our automation. Finally, we will build our application based on the design here, and test it using pre-annotated data.
We will target a typical email support use-case as the starting point for our design:
- An Outlook support mailbox receives a large number of customer emails daily.
- A triage team turns each email into a support ticket. This requires populating ticket fields with information from the email (eg. a customer ID). Each ticket is then added to the appropriate workflow queue.
- The tickets in the workflow queues are continuously processed by a customer support team.
Figure 3. A simple email support use-case
There are two automation opportunities here: the triage step and the processing step. This tutorial will demonstrate how to automate the triage step by using Communications Mining to extract required fields from the email, and assign the email to a workflow queue.
Due to the live connection between the Exchange server and Communications Mining, Communications Mining can serve as a data source for your application. This way a separate connection between your application and the Exchange server is not needed. Your application will continuously poll Communications Mining for new emails, and receive them together with their predicted labels and general fields. (We assume that no users are working directly in the mailbox's inbox at the same time as your application is running; otherwise you would need to account for conflicts between your application and mailbox users).
Your application will query Communications Mining and, for each email, check whether the required labels and general fields are present in the API response. If yes, it will create a ticket in the appropriate workflow queue. If not, it will make a second API request to Communications Mining to mark the email as a "no prediction" exception. Similarly, there should be a way for users processing the tickets to report miscategorized tickets so that the corresponding emails can be marked in Communications Mining as a "wrong prediction" exception. (Both exception types will then be reviewed and annotated by the model maintainer in order to improve model performance).
Parts of the design (shown in the diagram with a dotted outline) will be out of scope for this tutorial. In a real-life scenario, these steps should of course not be skipped:
- We will be using existing data in the platform instead of setting up a live EWS connection.
- The data comes pre-annotated, so we won't need to train a model.
- We won't design a feedback loop for "wrong prediction" exceptions since the design depends on the capabilities of the system where tickets are processed.
The recommended option for getting email data into Communications Mining is to use the Communications Mining EWS connector, but other options are also available. Since we are using data that is already in the platform, setting up data ingestion is not part of this tutorial. You can learn more about all available data ingestion options here.
We would like to automate this process:
A triage team turns each email into a support ticket. This requires populating ticket fields with information from the email (eg. a customer ID). Each ticket is then added to the appropriate workflow queue.
For the sake of this tutorial, let's assume that our workflow queues are "Renewal", "Cancellation", "Admin", and "Urgent". Emails concerning renewal, cancellation, and admin tasks (eg. address changes) are supposed to go into the respective queues, while all urgent emails should go into the "Urgent" queue regardless of topic.
In order to be able to categorize emails into the four workflow queues, the model has been trained to predict the labels "Renewal", "Cancellation", "Admin", and "Urgent". In order to extract the customer ID, a "Customer ID" general field has been configured. (Communications Mining comes with many pre-built general field kinds; further general field kinds can be added based on the needs of your specific integration. You can see a list of currently available general fields here, and learn about requesting new general field kinds here).
We can now come up with a mapping between the predicted label(s) received from Communications Mining and the workflow queue the email should go into:
IF number of labels == 0 THEN put email into "Uncategorised" queue
IF one of labels is "Urgent" THEN put email into "Urgent" queue
ELSE
IF number of labels == 1 THEN put email into the respective queue
ELSE put email into "Uncategorised" queue
IF number of labels == 0 THEN put email into "Uncategorised" queue
IF one of labels is "Urgent" THEN put email into "Urgent" queue
ELSE
IF number of labels == 1 THEN put email into the respective queue
ELSE put email into "Uncategorised" queue
We made a few choices for the sake of the tutorial:
- In addition to the existing four workflow queues there is a special "Uncategorised" queue. If the model is not able to provide a prediction, we put the email there to be manually processed. Alternatively we could have picked an existing queue that should deal with all uncategorised emails, for example "Admin".
- If an email has more than one label from the set of
["Renewal", "Cancellation", "Admin"]
, it means that it contains multiple requests. We choose to put such emails into the "Uncategorised" queue, perhaps because we don't anticipate to get many of them. Alternatively we could have created a "Complex Requests" queue.
In a real-life scenario, you should base such decisions on the specific requirements of your use case.
In order to query a model for predictions you of course need to have a trained model. A model is trained by annotating some of the data you ingested. Since multiple hours of annotating are required in order to produce a model that performs well, we will be using pre-annotated data in this tutorial so that you won't need to train your own model.
In a real-life scenario, a model trainer should have good domain knowledge of the data. For example, the user of a support mailbox would be a good model trainer to label the data coming from that mailbox. The training needs to be done carefully in order to produce a model that performs well and is not biased. To that end, Communications Mining provides training resources and offers hands-on training workshops.
Even a well-performing model will occasionally provide incorrect results, either by failing to predict a label, or by predicting the wrong label. One of the best ways to improve the model is to annotate the emails the model doesn't perform well on. For this reason, we want to have a feedback loop for such emails:
For each email, your application checks whether required labels and general fields are present. If yes, it creates a ticket in the appropriate workflow queue. If not, it makes a second API request to Communications Mining to mark the email as a "no prediction" exception. Similarly, there should be a way for users processing the tickets to report miscategorized tickets so that the corresponding emails can be marked in Communications Mining as a "wrong prediction" exception.
Our design shows feedback loops for both types of exceptions.
Our design shows workflow queues in an abstract way. In reality, you might be pushing the emails directly into a CRM platform, use a message broker such as Kafka, or even simply move the emails from the inbox folder into a subfolder. For the purposes of this tutorial, we will mock up the queues, but you are encouraged to develop your test integration end-to-end.
In order to fetch incoming emails together with predicted labels and extracted general fields, we will use the The Stream API allows you to define a stream of comments based on a dataset, pinned model version, and optional comment filters, and to iterate through them in a stateful way.
Each result in the Stream response will contain a comment, a list of predicted labels, and a list of general fields. This is passed as a JSON structure as seen below:
{
"comment": {...},
"entities": [...],
"labels": [...],
...
}
{
"comment": {...},
"entities": [...],
"labels": [...],
...
}
The following section explains how to correctly interpret the predicted labels in each stream response.
Confidence Scores
The Stream endpoint will return predicted labels together with a confidence score (a number between 0 and 1). For example, the snippet below would be for predictions of "Cancellation" and "Admin" with confidences of about 0.84 and 0.02 respectively:
"labels": [
{
"name": ["Cancellation"],
"probability": 0.8374786376953125
},
{
"name": ["Admin"],
"probability": 0.0164003014564514
}
]
"labels": [
{
"name": ["Cancellation"],
"probability": 0.8374786376953125
},
{
"name": ["Admin"],
"probability": 0.0164003014564514
}
]
In order to correctly interpret such a result, you need to determine the minimum confidence score at which you will treat the prediction as saying "yes, the label applies". We call this number the confidence score threshold.
In order to understand confidence thresholds you should be familiar with the terms precision and recall. You can find an explanation of these terms on our support pages. Briefly, a high precision relates to a low false positive rate (i.e. your results are more likely to be accurate), and a high recall relates to a low false negative rate (i.e. you are less likely to miss relevant results).
Using the interactive slider, you can quickly find your desired threshold: move the slider to the right to optimize for precision, or to the left to optimize for recall, until you find the precision and recall that matches your application requirements. The displayed threshold value will be your desired threshold. If you want to learn more about the Validation page functionality, please see the support pages.
If you look through the Validation page you might notice that the shapes of precision-recall-curves are different for each label. This gives you a hint as to how we will be picking thresholds: we will pick an individual threshold for each label. This is particularly important in automation applications where the best performance must be ensured.
We'll be using the following threshold values for the rest of the tutorial.
Admin: 0.898 (corresponds to 100% precision at 100% recall)
Cancellation: 0.619 (corresponds to 100% precision at 100% recall)
Renewal: 0.702 (corresponds to 100% precision at 100% recall)
Urgent: 0.179 (corresponds to 83% precision at 100% recall)
Admin: 0.898 (corresponds to 100% precision at 100% recall)
Cancellation: 0.619 (corresponds to 100% precision at 100% recall)
Renewal: 0.702 (corresponds to 100% precision at 100% recall)
Urgent: 0.179 (corresponds to 83% precision at 100% recall)
0.8374786376953125 > 0.619
. The "Admin" label doesn't apply since 0.0164003014564514 < 0.898
.
"labels": [
{
"name": ["Cancellation"],
"probability": 0.8374786376953125
},
{
"name": ["Admin"],
"probability": 0.0164003014564514
}
]
"labels": [
{
"name": ["Cancellation"],
"probability": 0.8374786376953125
},
{
"name": ["Admin"],
"probability": 0.0164003014564514
}
]
To make this process easier, the Streams API allows you to specify your label thresholds in the Stream config. If specified, only labels with values above its threshold are returned.
In a real-life scenario, the target precision-recall performance will be decided by the combination of business requirements and historical model performance. For example, if a label historically achieved 85% precision at 55% recall, you may decide to invest additional time into training it up to 90% precision at 55% recall. You will then pin the new model version, pick new thresholds, and update the configuration of your application.
Having finalized our design, we are ready to start building out our application.
Go to the Models page and pin the model by clicking on the "Save" toggle. Once the model is pinned, you can start accessing it via the API.
If you want to follow this part of the tutorial using a different annotated dataset, you should make sure that it's sufficiently annotated. In particular, a dataset with only a few annotated examples will produce a model that won't return predictions for the majority of comments.
In most cases, it is enough to specify the Stream name, dataset name and model version, and the labels you are interested in. For a full list of options, see the reference.
For each label, you should specify a threshold. See the section earlier in this tutorial on how to pick label thresholds.
curl -X PUT 'https://<my_api_endpoint>/api/v1/datasets/<my-project>/<my-dataset>/streams' \
-H "Authorization: Bearer $REINFER_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"stream": {
"name": "<my-stream-name>",
"model": {
"version": <my-model-version>,
"label_thresholds": [
{
"name": [
"Parent Label",
"Child Label"
],
"threshold": <my-label-threshold>
},
{
"name": [
"Label Without Parent"
],
"threshold": <my-label-threshold>
}
]
}
}
}'
curl -X PUT 'https://<my_api_endpoint>/api/v1/datasets/<my-project>/<my-dataset>/streams' \
-H "Authorization: Bearer $REINFER_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"stream": {
"name": "<my-stream-name>",
"model": {
"version": <my-model-version>,
"label_thresholds": [
{
"name": [
"Parent Label",
"Child Label"
],
"threshold": <my-label-threshold>
},
{
"name": [
"Label Without Parent"
],
"threshold": <my-label-threshold>
}
]
}
}
}'
You can now use your stream to fetch comments from Communications Mining. Note that very low batch sizes (such as fetching in batches of 1 comment) will impact the speed at which the comments are fetched.
curl -X POST 'https://<my_api_endpoint>/api/v1/datasets/<my-project>/<my-dataset>/streams/<my-stream-name>/fetch' \
-H "Authorization: Bearer $REINFER_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"size": <my-stream-batch-size>
}'
curl -X POST 'https://<my_api_endpoint>/api/v1/datasets/<my-project>/<my-dataset>/streams/<my-stream-name>/fetch' \
-H "Authorization: Bearer $REINFER_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"size": <my-stream-batch-size>
}'
The initial position of the stream is set to its creation time. For development purposes, it is often useful to fetch comments that were created before the stream. In order to do so, you can set the stream to a specific timestamp.
curl -X POST 'https://<my_api_endpoint>/api/v1/datasets/<my-project>/<my-dataset>/streams/<my-stream-name>/reset' \
-H "Authorization: Bearer $REINFER_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"to_comment_created_at": "<YYYY-MM-DDTHH:MM:SS>"
}'
curl -X POST 'https://<my_api_endpoint>/api/v1/datasets/<my-project>/<my-dataset>/streams/<my-stream-name>/reset' \
-H "Authorization: Bearer $REINFER_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"to_comment_created_at": "<YYYY-MM-DDTHH:MM:SS>"
}'
fetch
request now, it will fetch starting from the same position. In order to fetch the next batch of comments, you have to acknowledge
the previous batch with an advance
request. In the advance
request, you have to provide a sequence_id
which you can find in your fetch
response.
The fetch-and-advance loop guarantees that you don't accidentally skip comments if your application fails during processing. Note that your application needs to be able to handle seeing a comment multiple times in case of successfully processing a comment but failing at the advance step.
curl -X POST 'https://<my_api_endpoint>/api/v1/datasets/<my-project>/<my-dataset>/streams/<my-stream-name>/advance' \
-H "Authorization: Bearer $REINFER_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"sequence_id": "<my-sequence-id>"
}'
curl -X POST 'https://<my_api_endpoint>/api/v1/datasets/<my-project>/<my-dataset>/streams/<my-stream-name>/advance' \
-H "Authorization: Bearer $REINFER_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"sequence_id": "<my-sequence-id>"
}'
Note that you have to provide the dataset name in all API requests using the stream - this is because streams are scoped under datasets.
The response will contain comments, and predicted labels and general fields, as described on the Comments and Labels and general fields pages. Please refer to these pages to understand how to parse the response.
If you application allows users to tag items that were predicted incorrectly, you can use the exception endpoint to tag the corresponding comment as an exception in the platform. The exception name will be available as a filter in the dataset, so that a model trainer can review and annotate exceptions to improve the model.
curl -X PUT 'https://<my_api_endpoint>/api/v1/datasets/<my-project>/<my-dataset>/streams/<my-stream-name>/exceptions' \
-H "Authorization: Bearer $REINFER_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"exceptions": [
{
"metadata": {
"type": "Wrong Prediction"
},
"uid": "<comment-uid>"
},
{
"metadata": {
"type": "Wrong Prediction"
},
"uid": "<comment-uid>"
}
]
}'
curl -X PUT 'https://<my_api_endpoint>/api/v1/datasets/<my-project>/<my-dataset>/streams/<my-stream-name>/exceptions' \
-H "Authorization: Bearer $REINFER_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"exceptions": [
{
"metadata": {
"type": "Wrong Prediction"
},
"uid": "<comment-uid>"
},
{
"metadata": {
"type": "Wrong Prediction"
},
"uid": "<comment-uid>"
}
]
}'
Congratulations, you have completed the Communications Mining automation tutorial. Of course, your own automation application may be different from what is covered here. Please contact support if you have questions.
- Prerequisites
- Communications Mining basics
- Communications Mining access
- Tutorial Data
- Design your application
- Use-case overview
- End-to-End Design
- Data Ingestion
- Business Logic
- Model Training
- Exception Handling
- Downstream Systems
- Understand stream API
- Confidence Thresholds Scores
- Precision and Recall
- Confidence Threshold
- Example Thresholds
- Build your application
- Pin your Model
- Stream Configuration
- Fetch-and-Advance Loop
- Process Results
- Exception Handling
- Done!