communications-mining
latest
false
重要 :
请注意,此内容已使用机器翻译进行了本地化。
Communications Mining 开发者指南
Last updated 2024年9月27日

Elasticsearch 集成

Communications Mining 提供了一组丰富的内置分析工具。 但是,有时有必要将来自 Communications Mining 的预测与无法作为 Communications Mining 注释的一部分上传的数据结合在一起。 在这些情况下,常见的解决方案是将 Communications Mining 预测和任何其他数据建立索引到 Elasticsearch,并使用 Kibana 等工具来驱动分析。 本教程介绍如何将 Communications Mining 数据导入 Elasticsearch 并在 Kibana 中将其可视化。

本教程的示例中使用的数据是从保险域生成的虚拟电子邮件。

在 Elasticsearch 中存储数据

首先,让我们定义要导入 Elasticsearch 的数据。 Communications Mining API 在嵌套的 JSON 对象中提供注释文本、注释元数据、预测标签和预测常规字段。 以下是 Communications Mining API 提供的原始注释示例。 (请注意,您可能会看到不同的元数据字段,具体取决于将数据提取到 Communications Mining 的方式。 您可以在此处了解有关注释对象字段的更多信息。)

{
  "comment": {
    "id": "c7a1c529-3f57-4be6-9102-c9f892b81ae51",
    "uid": "49ba2c56a945386c.c7a1c529-3f57-4be6-9102-c9f892b81ae51",
    "timestamp": "2021-03-29T08:36:25.607Z",
    "messages": [
      {
        "body": {
          "text": "The policyholder has changed their address to the new address: 19 Essex Gardens, SW17 2UL"
        },
        "subject": {
          "text": "Change of address - Policy SFG48807871"
        },
        "from": "CPX8460080@broker.com",
        "to": ["underwriter@insurer.com"],
        "sent_at": "2021-03-29T08:36:25.607Z"
      }
    ]
    // (... more properties ...)
  },
  "labels": [
    {
      "name": ["Admin"],
      "probability": 0.9995054006576538
    },
    {
      "name": ["Admin", "Change of address"],
      "probability": 0.9995054006576538
    }
  ],
  "entities": [
    {
      "name": "address-line-1",
      "formatted_value": "19 Essex Gardens",
      "span": {
        "content_part": "body",
        "message_index": 0,
        "char_start": 63,
        "char_end": 79,
        "utf16_byte_start": 126,
        "utf16_byte_end": 158
      }
    },
    {
      "name": "post-code",
      "formatted_value": "SW17 2UL",
      "span": {
        "content_part": "body",
        "message_index": 0,
        "char_start": 81,
        "char_end": 89,
        "utf16_byte_start": 162,
        "utf16_byte_end": 178
      }
    },
    {
      "name": "policy-number",
      "formatted_value": "SFG48807871",
      "span": {
        "content_part": "subject",
        "message_index": 0,
        "char_start": 27,
        "char_end": 38,
        "utf16_byte_start": 54,
        "utf16_byte_end": 76
      }
    }
  ]
}{
  "comment": {
    "id": "c7a1c529-3f57-4be6-9102-c9f892b81ae51",
    "uid": "49ba2c56a945386c.c7a1c529-3f57-4be6-9102-c9f892b81ae51",
    "timestamp": "2021-03-29T08:36:25.607Z",
    "messages": [
      {
        "body": {
          "text": "The policyholder has changed their address to the new address: 19 Essex Gardens, SW17 2UL"
        },
        "subject": {
          "text": "Change of address - Policy SFG48807871"
        },
        "from": "CPX8460080@broker.com",
        "to": ["underwriter@insurer.com"],
        "sent_at": "2021-03-29T08:36:25.607Z"
      }
    ]
    // (... more properties ...)
  },
  "labels": [
    {
      "name": ["Admin"],
      "probability": 0.9995054006576538
    },
    {
      "name": ["Admin", "Change of address"],
      "probability": 0.9995054006576538
    }
  ],
  "entities": [
    {
      "name": "address-line-1",
      "formatted_value": "19 Essex Gardens",
      "span": {
        "content_part": "body",
        "message_index": 0,
        "char_start": 63,
        "char_end": 79,
        "utf16_byte_start": 126,
        "utf16_byte_end": 158
      }
    },
    {
      "name": "post-code",
      "formatted_value": "SW17 2UL",
      "span": {
        "content_part": "body",
        "message_index": 0,
        "char_start": 81,
        "char_end": 89,
        "utf16_byte_start": 162,
        "utf16_byte_end": 178
      }
    },
    {
      "name": "policy-number",
      "formatted_value": "SFG48807871",
      "span": {
        "content_part": "subject",
        "message_index": 0,
        "char_start": 27,
        "char_end": 38,
        "utf16_byte_start": 54,
        "utf16_byte_end": 76
      }
    }
  ]
}

Communications Mining API 返回的原始注释的架构不便于在 Elasticsearch 中筛选和查询此数据,因此应在将数据提取到 Elasticsearch 之前更改架构。 以下是您可以使用的展平架构示例。 您应该添加用例所需的所有字段。

{
  "id": "c7a1c529-3f57-4be6-9102-c9f892b81ae51",
  "uid": "49ba2c56a945386c.c7a1c529-3f57-4be6-9102-c9f892b81ae51",
  "timestamp": "2021-03-29T08:36:25.607Z",
  "subject": "Change of address - Policy SFG48807871",
  "body": "The policyholder has changed their address to the new address: 19 Essex Gardens, SW17 2UL",
  // (... more fields ...)
  "labels": ["Admin", "Admin > Change of address"],
  "entities": {
    "policy_number": ["SFG48807871"],
    "address-line-1": ["19 Essex Gardens"],
    "post-code": ["SW17 2UL"]
  }
}{
  "id": "c7a1c529-3f57-4be6-9102-c9f892b81ae51",
  "uid": "49ba2c56a945386c.c7a1c529-3f57-4be6-9102-c9f892b81ae51",
  "timestamp": "2021-03-29T08:36:25.607Z",
  "subject": "Change of address - Policy SFG48807871",
  "body": "The policyholder has changed their address to the new address: 19 Essex Gardens, SW17 2UL",
  // (... more fields ...)
  "labels": ["Admin", "Admin > Change of address"],
  "entities": {
    "policy_number": ["SFG48807871"],
    "address-line-1": ["19 Essex Gardens"],
    "post-code": ["SW17 2UL"]
  }
}
请注意,注释可以有零个、一个或多个标签,因此labels字段必须是数组。 此外,如果为数据集配置了一个或多个常规字段类型,则每种常规字段类型的注释将包含零个、一个或多个常规字段。 原始 API 响应中的层次结构标签名称本身就是数组 ( ["Admin", "Change of address"] ),应转换为字符串 ( "Admin > Change of address" )。

正在获取数据

In order to fetch the data, we recommend using the . (See here for an overview of all available data download methods.) When creating a Stream, you should set the thresholds for each label so that labels with confidence scores below the threshold are discarded. This is easiest to do from the Communications Mining UI by going to the "Streams" page of a dataset. Having used the confidence scores to determine whether a label applies, you can then import just the label names into Elasticsearch. (See the Labels for Analytics section for a discussion on when we recommend to drop or keep label confidence scores.)

常规字段没有置信度分数,因此不需要特殊处理。

备注:

模型变更管理

创建流时,请指定模型版本。 此模型版本用于在从流中获取注释时提供预测。 即使用户继续在平台中训练新的模型版本,您的流也将使用您指定的模型版本,为您提供确定性结果。

To upgrade to a new model version, you have to create a new Stream which uses that model version, then update your code to use the new Stream. (For this reason, we recommend that you make the Stream name configurable in your code.) To ensure that analytics using predictions stay consistent, you should re-ingest predictions for historical data using the updated model version. You can do that by the Stream to the timestamp before your oldest comment, and re-ingesting the data from the start.

在 Kibana 中可视化数据

在 Elasticsearch 中为数据建立索引后,您就可以开始构建可视化。 本节提供了 Kibana 中许多常见可视化工具的简单示例。

泰美利安

您可以使用以下表达式生成前 5 个最常用标签随时间变化的图表。 请注意,这会同时显示顶级类别和子类别标签。

.es(index=example-data,split=labels:5,timefield=@timestamp)
    .label("$1", "^.* > labels:(.+) > .*").es(index=example-data,split=labels:5,timefield=@timestamp)
    .label("$1", "^.* > labels:(.+) > .*")
图 1.数据集中前 5 个标签随时间变化的情况。

条形图

此条形图显示数据集中排名前 20 的发件人电子邮件地址。 发件人电子邮件地址和收件人电子邮件地址是基于电子邮件的数据集中注释元数据的一部分。
图 2.排名前 20 的发件人电子邮件地址。

Pie Chart

此饼图显示顶级“声明”标签下的子类别标签。 标签类别由训练模型的用户定义。
3.Claim 标签的子类别。

此页面有帮助吗?

获取您需要的帮助
了解 RPA - 自动化课程
UiPath Community 论坛
Uipath Logo White
信任与安全
© 2005-2024 UiPath。保留所有权利。