ai-center
2020.10
false
- 发行说明
- 要求
- 安装
- 入门指南
- 项目
- 数据集
- ML 包
- 管道
- ML 技能
- ML 日志
- AI Fabric 中的 Document Understanding
- 基本故障排除指南
AI Center
Last updated 2024年6月6日
支持
本页详细介绍在安装时和使用产品时从何处查找相关信息,以报告错误或排除问题。
注意:诊断工具仅在安装应用程序(文档的步骤 5)后才能工作。请将此报告添加到支持票证中。
我们提供了一个诊断工具,可帮助您检查 AI Fabric 运行状况并识别安装中的问题。 要执行此诊断,只需连接到 AI Fabric 主机并运行以下命令:
bash <(curl https://raw.githubusercontent.com/UiPath/ai-customer-scripts/master/platform/generate-report.sh)
bash <(curl https://raw.githubusercontent.com/UiPath/ai-customer-scripts/master/platform/generate-report.sh)
对于离线模式,如果您无法从计算机本身访问上方的 url,请创建新文件 generate-report.sh,复制并粘贴上面的文件,然后执行以下命令:
bash generate-report.sh
bash generate-report.sh
这将生成文件 aifabric-diagnostics-latest.log(下面的示例),其中包含有关不同 AI Fabric 服务状态的报告。如果 AI Fabric 计算机上确实打开了正确的端口,请尝试上传文件和 ML 包,这将显示有关您的证书和 GPU 状态的信息。
Fetching Core Services Status
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 867 0 867 0 0 3454 0 --:--:-- --:--:-- --:--:-- 3468
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 862 0 862 0 0 3747 0 --:--:-- --:--:-- --:--:-- 3747
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 470 0 470 0 0 13055 0 --:--:-- --:--:-- --:--:-- 13055
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 569 0 569 0 0 14589 0 --:--:-- --:--:-- --:--:-- 14973
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 444 0 444 0 0 12333 0 --:--:-- --:--:-- --:--:-- 12333
Starting Orchestrator Connection Check
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 492 100 492 0 0 5857 0 --:--:-- --:--:-- --:--:-- 5927
Successfully received response from orchestrator: HTTP/2 200
cache-control: no-store, must-revalidate, no-cache, max-age=0
content-type: application/json; charset=utf-8
x-correlation-id: 655adc9a-df94-47b3-8a35-40ffed513acc
api-supported-versions: 10.0
x-content-type-options: nosniff
x-frame-options: DENY
strict-transport-security: max-age=31536000; includeSubDomains
server:
date: Tue, 08 Dec 2020 14:28:24 GMT
content-length: 492
{"keys":[{"alg":"RS256","e":"AQAB","kid":"BA16
...
Checking aifabric ports availability in the Cluster
aif.snvenkat1.xyz (52.178.221.160:31390) open
aif.snvenkat1.xyz (52.178.221.160:31443) open
aif.snvenkat1.xyz (52.178.221.160:6443) open
Open
Fetching Certificate Details from Orchstrator and AIFabric
depth=0 CN = aifabricqaorchtest.northeurope.cloudapp.azure.com
verify error:num=20:unable to get local issuer certificate
verify return:1
depth=0 CN = aifabricqaorchtest.northeurope.cloudapp.azure.com
verify error:num=21:unable to verify the first certificate
verify return:1
DONE
depth=2 C = US, ST = New Jersey, L = Jersey City, O = The USERTRUST Network, CN = USERTrust RSA Certification Authority
verify return:1
depth=1 C = AT, O = ZeroSSL, CN = ZeroSSL RSA Domain Secure Site CA
verify return:1
depth=0 CN = aif.snvenkat1.xyz
verify return:1
DONE
Check if GPU is installed in the Cluster!!
Node: dm-onebox
GPU Capacity : 1
GPU Node Found!
-----Analysis Start
Core Services Status:
Deployer : "UP"
Trainer : "UP"
PkgManagaer : "UP"
Helper : "UP"
AppManager : "UP"
RabbitMQ : "UP"
AIFabric Ports Status:
AIFabric Port (31390) : Open
Storage Port (31443) : Open
Kubernetes Port (6443) : Open
Databases Health:
Deployer DB : "UP"
Trainer DB : "UP"
Helper DB : "UP"
PkgManager DB : "UP"
AppManager DB : "UP"
DockerRegistry Health:
Deployer Registry : "UP"
Trainer Registry : "UP"
Orchestrator Connection Status:
Orchestrator connection is Healthy!
Certificates Check:
Your orchestrator certificate is valid for following IP/Hosts, please make sure it matches the host/IP you are using in AIFabric Setup.
DNS:aifabricqaorchtest.northeurope.cloudapp.azure.com
Expiry Date of the Orchestrator Certificate : Jul 22 12:11:30 2021 GMT
Your AIFabric Ingress Host certificate is valid for following IP/Hosts, please make sure it matches the host/IP you are using for AIFabric Setup in Orchestrator
DNS:aif.snvenkat1.xyz
Expiry Date of the AIFabric Certificate : Dec 24 23:59:59 2020 GMT
Storage Checks:
1. Object Storage - File Upload Test Successful
2. Object Storage - File Deletion Test Successful
GPU Drivers Check:
GPU Available and Working Fine. Total no of nodes with GPU - 1
-----Analysis End
**Report Generated on Tue Dec 8 14:28:31 UTC 2020
Fetching Core Services Status
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 867 0 867 0 0 3454 0 --:--:-- --:--:-- --:--:-- 3468
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 862 0 862 0 0 3747 0 --:--:-- --:--:-- --:--:-- 3747
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 470 0 470 0 0 13055 0 --:--:-- --:--:-- --:--:-- 13055
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 569 0 569 0 0 14589 0 --:--:-- --:--:-- --:--:-- 14973
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 444 0 444 0 0 12333 0 --:--:-- --:--:-- --:--:-- 12333
Starting Orchestrator Connection Check
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 492 100 492 0 0 5857 0 --:--:-- --:--:-- --:--:-- 5927
Successfully received response from orchestrator: HTTP/2 200
cache-control: no-store, must-revalidate, no-cache, max-age=0
content-type: application/json; charset=utf-8
x-correlation-id: 655adc9a-df94-47b3-8a35-40ffed513acc
api-supported-versions: 10.0
x-content-type-options: nosniff
x-frame-options: DENY
strict-transport-security: max-age=31536000; includeSubDomains
server:
date: Tue, 08 Dec 2020 14:28:24 GMT
content-length: 492
{"keys":[{"alg":"RS256","e":"AQAB","kid":"BA16
...
Checking aifabric ports availability in the Cluster
aif.snvenkat1.xyz (52.178.221.160:31390) open
aif.snvenkat1.xyz (52.178.221.160:31443) open
aif.snvenkat1.xyz (52.178.221.160:6443) open
Open
Fetching Certificate Details from Orchstrator and AIFabric
depth=0 CN = aifabricqaorchtest.northeurope.cloudapp.azure.com
verify error:num=20:unable to get local issuer certificate
verify return:1
depth=0 CN = aifabricqaorchtest.northeurope.cloudapp.azure.com
verify error:num=21:unable to verify the first certificate
verify return:1
DONE
depth=2 C = US, ST = New Jersey, L = Jersey City, O = The USERTRUST Network, CN = USERTrust RSA Certification Authority
verify return:1
depth=1 C = AT, O = ZeroSSL, CN = ZeroSSL RSA Domain Secure Site CA
verify return:1
depth=0 CN = aif.snvenkat1.xyz
verify return:1
DONE
Check if GPU is installed in the Cluster!!
Node: dm-onebox
GPU Capacity : 1
GPU Node Found!
-----Analysis Start
Core Services Status:
Deployer : "UP"
Trainer : "UP"
PkgManagaer : "UP"
Helper : "UP"
AppManager : "UP"
RabbitMQ : "UP"
AIFabric Ports Status:
AIFabric Port (31390) : Open
Storage Port (31443) : Open
Kubernetes Port (6443) : Open
Databases Health:
Deployer DB : "UP"
Trainer DB : "UP"
Helper DB : "UP"
PkgManager DB : "UP"
AppManager DB : "UP"
DockerRegistry Health:
Deployer Registry : "UP"
Trainer Registry : "UP"
Orchestrator Connection Status:
Orchestrator connection is Healthy!
Certificates Check:
Your orchestrator certificate is valid for following IP/Hosts, please make sure it matches the host/IP you are using in AIFabric Setup.
DNS:aifabricqaorchtest.northeurope.cloudapp.azure.com
Expiry Date of the Orchestrator Certificate : Jul 22 12:11:30 2021 GMT
Your AIFabric Ingress Host certificate is valid for following IP/Hosts, please make sure it matches the host/IP you are using for AIFabric Setup in Orchestrator
DNS:aif.snvenkat1.xyz
Expiry Date of the AIFabric Certificate : Dec 24 23:59:59 2020 GMT
Storage Checks:
1. Object Storage - File Upload Test Successful
2. Object Storage - File Deletion Test Successful
GPU Drivers Check:
GPU Available and Working Fine. Total no of nodes with GPU - 1
-----Analysis End
**Report Generated on Tue Dec 8 14:28:31 UTC 2020
注意:安装基础架构(文档的步骤 4)后即可使用支持捆绑包。请将此捆绑包添加到支持票证中。
导航至管理用户界面 (
<machine-ip>:8800
),然后单击顶部导航栏上的“故障排除”。单击按钮以生成新的支持捆绑包,然后下载该捆绑包。
联系 UiPath 支持团队,他们将能够使用提供的捆绑包解决您的问题。
如果出于某种原因无法从管理控制台创建支持捆绑包,请从 Linux 终端使用以下命令创建支持捆绑包:
curl https://krew.sh/support-bundle | bash
kubectl support-bundle https://kots.io
curl https://krew.sh/support-bundle | bash
kubectl support-bundle https://kots.io
在计算机上创建 specs.yaml 文件,如下所示:
apiVersion: troubleshoot.replicated.com/v1beta1
kind: Collector
metadata:
name: collector-sample
spec:
collectors:
- clusterInfo: {}
- clusterResources: {}
- ceph: {}
- exec:
args:
- "-U"
- kotsadm
collectorName: kotsadm-postgres-db
command:
- pg_dump
containerName: kotsadm-postgres
name: kots/admin_console
selector:
- app=kotsadm-postgres
timeout: 10s
- logs:
collectorName: kotsadm-postgres-db
name: kots/admin_console
selector:
- app=kotsadm-postgres
- logs:
collectorName: kotsadm-api
name: kots/admin_console
selector:
- app=kotsadm-api
- logs:
collectorName: kotsadm-operator
name: kots/admin_console
selector:
- app=kotsadm-operator
- logs:
collectorName: kotsadm
name: kots/admin_console
selector:
- app=kotsadm
- logs:
collectorName: kurl-proxy-kotsadm
name: kots/admin_console
selector:
- app=kurl-proxy-kotsadm
- secret:
collectorName: kotsadm-replicated-registry
includeValue: false
key: .dockerconfigjson
name: kotsadm-replicated-registry
- logs:
collectorName: rook-ceph-agent
selector:
- app=rook-ceph-agent
namespace: rook-ceph
name: kots/rook
- logs:
collectorName: rook-ceph-mgr
selector:
- app=rook-ceph-mgr
namespace: rook-ceph
name: kots/rook
- logs:
collectorName: rook-ceph-mon
selector:
- app=rook-ceph-mon
namespace: rook-ceph
name: kots/rook
- logs:
collectorName: rook-ceph-operator
selector:
- app=rook-ceph-operator
namespace: rook-ceph
name: kots/rook
- logs:
collectorName: rook-ceph-osd
selector:
- app=rook-ceph-osd
namespace: rook-ceph
name: kots/rook
- logs:
collectorName: rook-ceph-osd-prepare
selector:
- app=rook-ceph-osd-prepare
namespace: rook-ceph
name: kots/rook
- logs:
collectorName: rook-ceph-rgw
selector:
- app=rook-ceph-rgw
namespace: rook-ceph
name: kots/rook
- logs:
collectorName: rook-discover
selector:
- app=rook-discover
namespace: rook-ceph
name: kots/rook
apiVersion: troubleshoot.replicated.com/v1beta1
kind: Collector
metadata:
name: collector-sample
spec:
collectors:
- clusterInfo: {}
- clusterResources: {}
- ceph: {}
- exec:
args:
- "-U"
- kotsadm
collectorName: kotsadm-postgres-db
command:
- pg_dump
containerName: kotsadm-postgres
name: kots/admin_console
selector:
- app=kotsadm-postgres
timeout: 10s
- logs:
collectorName: kotsadm-postgres-db
name: kots/admin_console
selector:
- app=kotsadm-postgres
- logs:
collectorName: kotsadm-api
name: kots/admin_console
selector:
- app=kotsadm-api
- logs:
collectorName: kotsadm-operator
name: kots/admin_console
selector:
- app=kotsadm-operator
- logs:
collectorName: kotsadm
name: kots/admin_console
selector:
- app=kotsadm
- logs:
collectorName: kurl-proxy-kotsadm
name: kots/admin_console
selector:
- app=kurl-proxy-kotsadm
- secret:
collectorName: kotsadm-replicated-registry
includeValue: false
key: .dockerconfigjson
name: kotsadm-replicated-registry
- logs:
collectorName: rook-ceph-agent
selector:
- app=rook-ceph-agent
namespace: rook-ceph
name: kots/rook
- logs:
collectorName: rook-ceph-mgr
selector:
- app=rook-ceph-mgr
namespace: rook-ceph
name: kots/rook
- logs:
collectorName: rook-ceph-mon
selector:
- app=rook-ceph-mon
namespace: rook-ceph
name: kots/rook
- logs:
collectorName: rook-ceph-operator
selector:
- app=rook-ceph-operator
namespace: rook-ceph
name: kots/rook
- logs:
collectorName: rook-ceph-osd
selector:
- app=rook-ceph-osd
namespace: rook-ceph
name: kots/rook
- logs:
collectorName: rook-ceph-osd-prepare
selector:
- app=rook-ceph-osd-prepare
namespace: rook-ceph
name: kots/rook
- logs:
collectorName: rook-ceph-rgw
selector:
- app=rook-ceph-rgw
namespace: rook-ceph
name: kots/rook
- logs:
collectorName: rook-discover
selector:
- app=rook-discover
namespace: rook-ceph
name: kots/rook
然后运行以下命令:
kubectl support-bundle /path/to/spec.yaml
kubectl support-bundle /path/to/spec.yaml
联系 UiPath 支持团队,他们将能够使用提供的捆绑包解决您的问题。
报告 Data Manager 问题时,请包含生成的日志。要检索生成的日志,请执行以下操作:
- 单击 Data Manager 右上角的问号。系统将显示“Data Manager”帮助菜单。
- 在“错误报告”部分中,单击“收集最近日志以进行错误报告”。系统将显示“最近的日志”窗口。