AI Center

Light Text Classification (ライトテキスト分類)

[すぐに使えるパッケージ] > [UiPath Language Analysis (UiPath 言語分析)] > [Light Text Classification (ライトテキスト分類)]

テキストの分類を目的とした、一般的で再トレーニング可能なモデルです。英語、フランス語、スペイン語など、アルファベットを使用するすべての言語をサポートしています。この ML パッケージはトレーニングする必要があります。デプロイする前にトレーニングしていないと、モデルがトレーニングされていないことを示すエラーが表示され、デプロイが失敗します。このモデルは Bag of Words を使用し、n-gram に基づいた説明性を提供します。

詳細

入力の種類

JSON および CSV

入力の説明

文字列に分類されるテキスト: 'I loved this movie.'

出力の説明

クラスと信頼度 (0-1) を含む JSON。

{
    "class": "7",
    "confidence": 0.1259827300369445,
    "ngrams": [
        [
            "like",
            1.3752658445706787
        ],
        [
            "like this",
            0.032029048484416685
        ]
    ]
}{
    "class": "7",
    "confidence": 0.1259827300369445,
    "ngrams": [
        [
            "like",
            1.3752658445706787
        ],
        [
            "like this",
            0.032029048484416685
        ]
    ]
}

GPU の推奨

GPU は不要です。

トレーニングが有効

既定では、トレーニングは有効化されています。

パイプライン

このパッケージは、3 種類のパイプライン (フルトレーニング、トレーニング、評価) すべてをサポートしています。このモデルでは高度な手法を使用してハイパーパラメーター検索を行い、高性能なモデルを見つけます。既定では、ハイパーパラメーター検索機能 (BOW.hyperparameter_search.enable 変数) は有効化されています。最も高性能なモデルのパラメーターは評価レポートで確認できます。

データセットの形式

このモデル用のデータセットを構造化する際は、JSON、CSV、AI Center™ の JSON 形式 (現在プライベートプレビュー中のラベル付けツールのエクスポート形式でもある) の 3 種類のオプションを使用できます。モデルは、指定したディレクトリ内のすべての CSV ファイルと JSON ファイルを読み取ります。すべての形式で、dataset.input_column_name と dataset.target_column_name の 2 つの列またはプロパティが既定で求められます。これら 2 つの列および/またはディレクトリの名前は、環境変数を使用して設定できます。

CSV ファイル形式

各 CSV ファイルには任意の数の列を含めることができますが、モデルで使用されるのは dataset.input_column_name と dataset.target_column_name パラメーターで指定された 2 列のみです。

CSV ファイル形式の例については、次のサンプルと環境変数を確認してください。

text, label
I like this movie, 7
I hated the acting, 9text, label
I like this movie, 7
I hated the acting, 9

前の例の環境変数は次のようになります。

dataset.input_format: auto
dataset.input_column_name: text
dataset.target_column_name: label

JSON ファイル形式

複数のデータポイントを同じ JSON ファイルに含めることができます。

JSON ファイル形式の例については、次のサンプルと環境変数を確認してください。

[
  {
    "text": "I like this movie",
    "label": "7"
  },
  {
    "text": "I hated the acting",
    "label": "9"
  }
][
  {
    "text": "I like this movie",
    "label": "7"
  },
  {
    "text": "I hated the acting",
    "label": "9"
  }
]

前の例の環境変数は次のようになります。

dataset.input_format: auto
dataset.input_column_name: text
dataset.target_column_name: label

ai_center ファイル形式

これは、設定できる環境変数の既定値です。このモデルは、指定されたディレクトリにある、 .json の拡張子を持つすべてのファイルを読み取ります。

ai_center ファイル形式の例については、次のサンプルと環境変数を確認してください。

{
    "annotations": {
        "intent": {
            "to_name": "text",
            "choices": [
                "TransactionIssue",
                "LoanIssue"
            ]
        },
        "sentiment": {
            "to_name": "text",
            "choices": [
                "Very Positive"
            ]
        },
        "ner": {
            "to_name": "text",
            "labels": [
                {
                    "start_index": 37,
                    "end_index": 47,
                    "entity": "Stakeholder",
                    "value": " Citi Bank"
                },
                {
                    "start_index": 51,
                    "end_index": 61,
                    "entity": "Date",
                    "value": "07/19/2018"
                },
                {
                    "start_index": 114,
                    "end_index": 118,
                    "entity": "Amount",
                    "value": "$500"
                },
                {
                    "start_index": 288,
                    "end_index": 293,
                    "entity": "Stakeholder",
                    "value": " Citi"
                }
            ]
        }
    },
    "data": {
        "cc": "",
        "to": "xyz@abc.com",
        "date": "1/29/2020 12:39:01 PM",
        "from": "abc@xyz.com",
        "text": "I opened my new checking account with Citi Bank in 07/19/2018 and met the requirements for the promotion offer of $500 . It has been more than 6 months and I have not received any bonus. I called the customer service several times in the past few months but no any response. I request the Citi honor its promotion offer as advertised."{
    "annotations": {
        "intent": {
            "to_name": "text",
            "choices": [
                "TransactionIssue",
                "LoanIssue"
            ]
        },
        "sentiment": {
            "to_name": "text",
            "choices": [
                "Very Positive"
            ]
        },
        "ner": {
            "to_name": "text",
            "labels": [
                {
                    "start_index": 37,
                    "end_index": 47,
                    "entity": "Stakeholder",
                    "value": " Citi Bank"
                },
                {
                    "start_index": 51,
                    "end_index": 61,
                    "entity": "Date",
                    "value": "07/19/2018"
                },
                {
                    "start_index": 114,
                    "end_index": 118,
                    "entity": "Amount",
                    "value": "$500"
                },
                {
                    "start_index": 288,
                    "end_index": 293,
                    "entity": "Stakeholder",
                    "value": " Citi"
                }
            ]
        }
    },
    "data": {
        "cc": "",
        "to": "xyz@abc.com",
        "date": "1/29/2020 12:39:01 PM",
        "from": "abc@xyz.com",
        "text": "I opened my new checking account with Citi Bank in 07/19/2018 and met the requirements for the promotion offer of $500 . It has been more than 6 months and I have not received any bonus. I called the customer service several times in the past few months but no any response. I request the Citi honor its promotion offer as advertised."

前のサンプル JSON を利用するには、環境変数を次のように設定する必要があります。

dataset.input_format: ai_center
dataset.input_column_name: data.text
dataset.target_column_name: annotations.intent.choices

GPU または CPU に対するトレーニング

トレーニングに GPU は不要です。

環境変数

dataset.input_column_name
- テキストが含まれる入力列の名前です。
- 既定値は data.text です。
- この変数は、お使いの入力 JSON または CSV ファイルに従って設定します。
dataset.target_column_name
- テキストが含まれるターゲット列の名前です。
- 既定値は annotations.intent.choices です。
- この変数は、お使いの入力 JSON または CSV ファイルに従って設定します。
dataset.input_format
- トレーニングデータの入力形式です。
- 既定値は ai_center です。
- サポートされている値は ai_center または auto です。
- ai_center を選択した場合は、JSON ファイルのみがサポートされます。また、ai_center を選択した場合は、dataset.target_column_nameの値を annotations.sentiment.choices に変更します。
- auto を選択した場合は、CoNLL ファイルと JSON ファイルの両方がサポートされます。
BOW.hyperparameter_search.enable
- このパラメーターの既定値は True です。有効化したままにすると、指定した期間とコンピューティングリソースにおける最も高性能なモデルを検索します。
- 試したパラメーターのバリエーションを示す PDF ファイル「HyperparameterSearch_report」も生成されます。
BOW.hyperparameter_search.timeout
- ハイパーパラメーター検索を実行できる最大時間を秒単位で指定します。
- 既定値は 1800 です。
BOW.explain_inference
- このパラメーターを True に設定すると、モデルを ML スキルとしてサービングする推論時に、最も重要な n-gram の一部が予測とともに返されます。
- 既定値は False です。

任意の変数

その他の任意の変数を追加するには、[Add new] ボタンをクリックします。ただし、BOW.hyperparameter_search.enable 変数を True に設定した場合、これらの変数の最適な値が検索されます。以下の任意のパラメーターが使用されるようにするには、BOW.hyperparameter_search.enable 検索変数を False に設定してください。

BOW.lr_kwargs.class_weight
- サポートされている値は balanced または None です。
BOW.ngram_range
- モデルの特徴量と見なすことができる、連続する単語のシーケンスにおけるシーケンスの長さの範囲です。
- (1, x) の形式で指定します。ここでの x は許可するシーケンスの最大長を表します。
BOW.min_df
- 特徴量と見なす基準となる、データセットにおける n-gram の最小出現回数を設定するために使用します。
- 推奨値は 0 から 10 の間の値です。
dataset.text_pp_remove_stop_words
- 検索に停止単語 (the、 or などの単語) を含めるかどうかを設定するために使用します。
- サポートされている値は True または False です。