Automation Suite

2023.10

偽

Linux の Automation Suite のインストールガイド

最終更新日 2024年4月19日

管理アラート

alertmanager.rules

AlertmanagerConfigInconsistent

これらは、複数の AlertManager レプリカを持つ HA クラスターを対象とする、内部の AlertManager エラーです。アラートが断続的に表示されたり消えたりする場合があります。一時的に AlertManager レプリカを縮小して拡大すると、問題が解決する可能性があります。

この問題を修正するには、次の手順に従います。

ゼロで表示。ポッドがシャットダウンするまで少し時間がかかることに注意してください。
```
kubectl scale statefulset -n cattle-monitoring-system alertmanager-rancher-monitoring-alertmanager --replicas=0kubectl scale statefulset -n cattle-monitoring-system alertmanager-rancher-monitoring-alertmanager --replicas=0
```

2 に縮小:

kubectl scale statefulset -n cattle-monitoring-system alertmanager-rancher-monitoring-alertmanager --replicas=2kubectl scale statefulset -n cattle-monitoring-system alertmanager-rancher-monitoring-alertmanager --replicas=2

Alertmanager ポッドが起動し、実行ステートになっているかどうかを確認します。
```
kubectl get po -n cattle-monitoring-systemkubectl get po -n cattle-monitoring-system
```

問題が解決しない場合は UiPath® サポートにお問い合わせください。

AlertmanagerFailedReload

AlertManager による構成の読み込みまたは再読み込みが失敗しました。AlertManager のカスタム構成に入力エラーがないかチェックし、入力エラーでない場合は、UiPath® サポートにお問い合わせください。

AlertmanagerMembersInconsistent

この問題を修正するには、次の手順に従います。

ゼロで表示。ポッドがシャットダウンするまで少し時間がかかることに注意してください。

kubectl scale statefulset -n cattle-monitoring-system alertmanager-rancher-monitoring-alertmanager --replicas=0kubectl scale statefulset -n cattle-monitoring-system alertmanager-rancher-monitoring-alertmanager --replicas=0

2 に縮小:

kubectl scale statefulset -n cattle-monitoring-system alertmanager-rancher-monitoring-alertmanager --replicas=2kubectl scale statefulset -n cattle-monitoring-system alertmanager-rancher-monitoring-alertmanager --replicas=2

Alertmanager ポッドが起動し、実行ステートになっているかどうかを確認します。
```
kubectl get po -n cattle-monitoring-systemkubectl get po -n cattle-monitoring-system
```

問題が解決しない場合は UiPath® サポートにお問い合わせください。

general.rules

TargetDown

このアラート時には、Prometheus がターゲットからメトリックを収集できません。つまり、Grafana ダッシュボード、およびターゲットからのメトリックに基づくその他のアラートが利用できません。ターゲットに関係のあるその他のアラートを確認してください。

Watchdog

警告しているパイプライン全体が機能していることを確認するためのアラートです。このアラートは常に発生しています。したがって、このアラートは常に AlertManager で受信者に対して通知されます。このアラートが発生していないときに通知する各種通知メカニズムと連携しています(例: PagerDuty の DeadMansSnitch との連携)。

prometheus-operator

PrometheusOperatorListErrors, PrometheusOperatorWatchErrors, PrometheusOperatorSyncFailed, PrometheusOperatorReconcileErrors, PrometheusOperatorNodeLookupErrors, PrometheusOperatorNotReady, PrometheusOperatorRejectedResources

Prometheus リソースを制御する Prometheus 演算子の内部エラーです。このエラーが発生している間はまだ Prometheus 自体は正常ですが、このエラーは、監視の構成可能性が低下していることを示しています。UiPath® サポートにお問い合わせください。

Prometheus

PrometheusBadConfig

Prometheus による構成の読み込みまたは再読み込みが失敗しました。Prometheus のカスタム構成に入力エラーがないかチェックし、それ以外の場合は、UiPath® サポートにお問い合わせください。

PrometheusErrorSendingAlertsToSomeAlertmanagers, PrometheusErrorSendingAlertsToAnyAlertmanager, PrometheusNotConnectedToAlertmanagers

Prometheus から AlertManager への接続が正常ではありません。メトリックは依然としてクエリ可能で、Grafana ダッシュボードに引き続き表示される可能性がありますが、アラートは発生しません。AlertManager のカスタム構成に入力エラーがないかチェックし、入力エラーでない場合は、UiPath® サポートにお問い合わせください。

PrometheusNotificationQueueRunningFull, PrometheusTSDBReloadsFailing, PrometheusTSDBCompactionsFailing, PrometheusNotIngestingSamples, PrometheusDuplicateTimestamps, PrometheusOutOfOrderTimestamps, PrometheusRemoteStorageFailures, PrometheusRemoteWriteBehind, PrometheusRemoteWriteDesiredShards

メトリックが期待どおりに収集されない可能性があることを示す、内部 Prometheus エラーです。UiPath® サポートにお問い合わせください。

PrometheusRuleFailures

これは、存在しないメトリックまたは誤った PromQL 構文に基づく、正しくないアラートがある場合に発生する可能性があります。カスタムアラートが追加されていない場合は、UiPath® サポートにお問い合わせください。

PrometheusMissingRuleEvaluations

Prometheus がアラートを発生させるべきかどうかを評価できません。これは、アラートが多すぎる場合に発生することがあります。コストのかかるカスタムアラートの評価を削除したり、Prometheus の CPU 制限値の引き上げに関するドキュメントを参照してください。カスタムアラートが追加されていない場合は、UiPath® サポートにお問い合わせください。

PrometheusTargetLimitHit

Prometheus が収集先とするターゲットが多すぎます。ServiceMonitor が追加されている (監視コンソールを確認) 場合は、これらを削除できます。

uipath.prometheus.resource.provisioning.alerts

PrometheusMemoryUsage, PrometheusStorageUsage

これらのアラートは、クラスターのメモリとストレージの使用量が設定された上限に近づいていることを警告します。クラスターで使用量が最近大幅に増加した (通常はユーザーではなくロボットによる使用) 場合、または Prometheus のリソースを調整せずにノードがクラスターに追加されたときに発生する可能性が高くなります。この問題は、収集されるメトリック数の増加が原因で発生します。

ストレージ使用量の増加率は、[Kubernetes / Persistent Volumes] ダッシュボードで確認できます。

これは、「クラスターを構成する」の手順に従って PVC のサイズを変更することで、調整できます。

メモリ使用量の増加率は、[Kubernetes / Compute Resources / Pod] ダッシュボードで確認できます。

これは、ArgoCD のランチャー監視アプリで Prometheus のメモリリソースの制限値を編集することによって調整できます。[Save] をクリックすると、ランチャー監視アプリが自動的に再同期します。

Prometheus を再起動し、Grafana でのメトリックの表示が再開するには時間がかかります。大規模なクラスタであっても、通常 10 分未満かかります。

uipath.availability.alerts

UiPathAvailabilityHighTrafficUserFacing

UiPath® サービスからの http 500 応答の数が、指定されたしきい値を超えています。

トラフィックレベル	20 分以内のリクエスト数	エラーしきい値 (http 500番台のエラー)
高 (High)	>100,000	0.1%
中	10,000 ～ 100,000	1%
低 (Low)	< 10,000	5%

ユーザー向けのサービスでエラーが発生すると、Automation Suite UI で直接確認可能な機能低下が生じる可能性があります。これに対し、バックエンドサービスのエラーによる影響はあまり明白ではない可能性があります。

このアラートによって、どのサービスのエラー率が高いのかがわかります。レポートしているサービスが依存している他のサービスからどのような連鎖的な問題が生じ得るのかを理解するには、サービス間のエラーを表示する Istio Workload ダッシュボードを使用できます。

最近になって再構成された Automation Suite 製品すべてを再確認してください。kubectl logs コマンドで詳細なログを使用することもできます。エラーが解決しない場合は UiPath® サポートにご連絡ください。

バックアップ

NFSServerDisconnected

このアラートは、NFS サーバーの接続が失われたことを示します。

NFS サーバーの接続とマウントパスを確認する必要があります。

VolumeBackupFailed

このアラートは、PVC のバックアップが失敗したことを示します。

BackupDisabled

このアラートは、バックアップが無効化されていることを示します。

クラスターに異常がないか確認する必要があります。

cronjob-alerts

CronJobSuspended

uipath-infra/istio-configure-script-cronjob cronjob が中断ステートにあります。

この問題を修正するには、次の手順に従って cronjob を有効化します。

export KUBECONFIG="/etc/rancher/rke2/rke2.yaml" && export PATH="$PATH:/usr/local/bin:/var/lib/rancher/rke2/bin"
kubectl -n uipath-infra patch cronjob istio-configure-script-cronjob -p '{"spec":{"suspend":false}}'
epoch=$(date +"%s")
kubectl -n uipath-infra create job istio-configure-script-cronjob-manual-$epoch --from=cronjob/istio-configure-script-cronjob
kubectl -n uipath-infra wait --for=condition=complete --timeout=300s job/istio-configure-script-cronjob-manual-$epoch
kubectl get node -o wide
#Verify if all the IP's listed by the above command are part of output of below command
kubectl -n istio-system get svc istio-ingressgateway -o json | jq '.spec.externalIPs'export KUBECONFIG="/etc/rancher/rke2/rke2.yaml" && export PATH="$PATH:/usr/local/bin:/var/lib/rancher/rke2/bin"
kubectl -n uipath-infra patch cronjob istio-configure-script-cronjob -p '{"spec":{"suspend":false}}'
epoch=$(date +"%s")
kubectl -n uipath-infra create job istio-configure-script-cronjob-manual-$epoch --from=cronjob/istio-configure-script-cronjob
kubectl -n uipath-infra wait --for=condition=complete --timeout=300s job/istio-configure-script-cronjob-manual-$epoch
kubectl get node -o wide
#Verify if all the IP's listed by the above command are part of output of below command
kubectl -n istio-system get svc istio-ingressgateway -o json | jq '.spec.externalIPs'