ixp
latest
false
UiPath logo, featuring letters U and I in white

Communications Mining 用户指南

上次更新日期 2025年10月7日

常规字段提取

Communications Mining™ extracts the following types of output from unstructured text:
  • 标签
  • 通用字段

Labels describe the entire message, for example, Cancellation, Trade failure, or Urgent. General fields refer to specific parts of the message, for example, Counterparty name, Customer ID, or Cancellation date.

In a downstream process, labels are used to triage, prioritize, and decide what kind of action should be taken. General fields are used to fill in fields of requests. For example, a downstream process may filter messages to those that have the Cancellation label, and then use the extracted Customer ID and Cancellation date general fields to call an API to automatically process the cancellation.

Communications Mining comes with a number of built-in general fields for common concepts, such as Organization, Currency Code, or Date. You can customize the built-in general fields of Communications Mining so that they are tailored to your specific use case. For example, Communications Mining has a highly trained pre-built Date general field which you can use as a starting point for a more customized general field such as Renewal Date or Cancellation Date. Alternatively, you can start from scratch and teach Communications Mining to recognize something completely new.



配置常规字段

我们将使用保险用例作为示例。 保险公司邮箱接收来自代理的电子邮件,应将其分类到不同的团队进行处理。 在此示例中,数据集已经过训练,分类如下所示:
图 1.分类示例

此邮箱偶尔会收到紧急的续订请求、取消请求和管理员请求。Communications Mining™ 已经过训练,可以识别每个概念,并且 Communications Mining 预测可通过创建支持工单,将电子邮件分类到正确的团队。

为确保快速响应客户,我们可以提取一些关键数据点,以帮助下游团队处理请求。 具体来说,我们要从电子邮件中提取保单编号、受保组织名称和经纪人名称。 我们可以使用常规字段提取来执行此操作。
图 2.已配置的常规字段

由于保单编号格式特定于该特定保险公司,因此我们将常规字段配置为可从头开始训练。 另一方面,参保组织是一种组织,因此我们根据内置的组织常规字段将其配置为可培训。 最后,我们注意到代理并不总是将其名称输入到电子邮件中,因此我们决定使用代理电子邮件地址(可从注释元数据中获取)在内部数据库中查找相应的名称,而不是将其提取为常规字段。

The following table summarizes these approaches.

配置何时使用示例
不含基本通用字段的可训练通用字段最常用于各种内部 ID,或者在 Communications Mining 中没有合适的基本常规字段时使用。保单编号、客户 ID
具有基本通用字段的可训练通用字段用于自定义 Communications Mining 中现有的预构建常规字段。取消日期(基于日期)、受保组织(基于组织)
预构建的常规字段(不可训练)用于应完全按照定义匹配的一般字段,否则训练会导致出错。位于
使用注释元数据代替常规字段当注释元数据中已以结构化形式显示所需信息时使用。发件人地址、发件人域

在应用程序中使用常规字段

Communications Mining™ 提供多种获取预测(包括预测通用字段)的方法。请参阅数据下载概述,了解哪种方法最适合您的用例。

无论选择哪种方法,您都需要了解以下边缘情况,并在应用程序中进行处理:

  • 响应中并未包含所有预期常规字段
  • 响应包含一个或多个常规字段的多个匹配项
  • 并非响应中显示的所有常规字段都正确

在本节中,我们将更详细地介绍每种边缘情况。

响应中并未包含所有常规字段

You should expect to handle cases where not all expected general fields are present. In the following example, the email has the policy number, but doesn't have the insured organization name. Your application should be able to handle such partial information.
Figure 3. Missing Insured Organization

响应包含一个或多个常规字段的多个匹配项

You should also expect to handle the opposite of the previous case, namely cases where a comment has more general fields than expected. In the following example, even though we expect one policy number and insured organization name per email, the email has multiple policy numbers.
Figure 4. Multiple matches for the same general field

Note that you can use the metadata in the response when handling such cases. For example, we can choose to preferentially pick policy numbers that appear in the email subject over those that appear in the email body. The following example shows the response that the API will return for our example email.

{
  "predictions": [
    {
      "uid": "aa05ba2250de48e3.7588b85f68f81c3b",
      "labels": [...],
      "entities": [
        {
          "id": "6a1d11118b60868e",
          "name": "policy-number",
          "span": {
            "content_part": "body",
            "message_index": 0,
            "utf16_byte_start": 200,
            "utf16_byte_end": 222,
            "char_start": 100,
            "char_end": 111
          },
          "kind": "policy-number",
          "formatted_value": "GHI-0204963"
        },
        {
          "id": "6a1d11118b60868e",
          "name": "policy-number",
          "span": {
            "content_part": "subject",
            "message_index": 0,
            "utf16_byte_start": 0,
            "utf16_byte_end": 22,
            "char_start": 0,
            "char_end": 11
          },
          "kind": "policy-number",
          "formatted_value": "GHI-0068448"
        },
        {...},
        {...},
        {...}
      ]
    }
  ],
  "model": {
    "version": 31,
    "time": "2021-07-14T15:00:57.608000Z"
  },
  "status": "ok"
}{
  "predictions": [
    {
      "uid": "aa05ba2250de48e3.7588b85f68f81c3b",
      "labels": [...],
      "entities": [
        {
          "id": "6a1d11118b60868e",
          "name": "policy-number",
          "span": {
            "content_part": "body",
            "message_index": 0,
            "utf16_byte_start": 200,
            "utf16_byte_end": 222,
            "char_start": 100,
            "char_end": 111
          },
          "kind": "policy-number",
          "formatted_value": "GHI-0204963"
        },
        {
          "id": "6a1d11118b60868e",
          "name": "policy-number",
          "span": {
            "content_part": "subject",
            "message_index": 0,
            "utf16_byte_start": 0,
            "utf16_byte_end": 22,
            "char_start": 0,
            "char_end": 11
          },
          "kind": "policy-number",
          "formatted_value": "GHI-0068448"
        },
        {...},
        {...},
        {...}
      ]
    }
  ],
  "model": {
    "version": 31,
    "time": "2021-07-14T15:00:57.608000Z"
  },
  "status": "ok"
}

并非响应中显示的所有常规字段都正确

最后,由于一般字段是使用机器学习提取的,因此您应该会收到错误的匹配项。 错误匹配项的数量取决于您使用的通用字段。 数据集的“验证”页面提供验证统计信息,以了解常规字段的执行方式。
Figure 5. General field validation

此页面有帮助吗?

获取您需要的帮助
了解 RPA - 自动化课程
UiPath Community 论坛
Uipath Logo
信任与安全
© 2005-2025 UiPath。保留所有权利。