ixp
latest
false

Communications Mining 用户指南
上次更新日期 2025年10月6日
本节概述了平台的核心概念。
To learn more about the platform from an end-user perspective, check the Communications Mining™ user guide.
概念 | 描述 | 示例 |
---|---|---|
来源 | In Communications Mining™, data is organized in data sources, or sources. Typically a source corresponds to a channel. An email mailbox, the results of a survey or a set of customer reviews are all examples of data that can be uploaded to Communications Mining as a data source. Multiple sources can be combined to build a model, so it is best to opt for the side of multiple sources rather than a single monolithic source. | 该图表显示了电子邮件数据(包含单个电子邮件的来源 A)和客户评论数据(包含单个客户评论的来源 B 和 C)。 根据数据来源,客户评论数据分为两个来源,但会合并为一个数据集,以构建通用模型。 |
注释 | Within sources, each individual piece of text communication is represented as a comment. A comment will always have an ID, timestamp, and text body, and additional fields based on what type of data it represents. For example, emails will have the expected email fields such as From, To, Cc, and so on. | The diagram shows how the available comment fields are used by the various comment types. For example, in an email comment the From field contains the sender address, while in a customer review comment it contains the review author. The metadata fields, shown at the bottom of each comment, are user-defined. Note how we use the same set of fields for both customer review sources: since we want to combine them into a single dataset, the data should be consistent in order to ensure good model performance. |
数据集 | A dataset allows you to annotate one or more sources in order to build a model. A source can be included in multiple datasets. The set of all labels in a dataset is called a taxonomy. | The diagram shows two datasets built on top of the support mailbox data, and one dataset combining the customer review data. Note that even though Dataset 1 and Dataset 2 are based on the same data, their label taxonomy is different, because their use-cases, that is, analytics and automation, call for different sets of labels. |
模型 | 随着用户注释更多数据,模型会不断更新。 为了获得一致的预测,需要在查询模型时指定模型版本号。 | |
标签 | Labels are applied when training a model, and are returned when querying the model for predictions. When labels are returned as predictions, they have an associated confidence score that indicates how likely the model thinks the prediction applies. To convert the prediction into a Yes or No answer, the confidence score needs to be checked against a threshold, which is chosen to represent a suitable precision/recall tradeoff. | Labels are assigned by Communications Mining users when training the model. The Communications Mining user interface helps the user annotate the most relevant comments, ensure that labels are applied consistently, and that enough comments are annotated to produce a well-performing model. |