automation-suite
2021.10
false
重要 :
请注意此内容已使用机器翻译进行了部分本地化。
Automation Suite 安装指南
Last updated 2024年8月26日

添加具有 GPU 支持的专用代理节点

备注:

Automation Suite 当前仅支持 NVIDIA GPU 驱动程序。请参阅支持 GPU 的操作系统列表。

有关特定于云的实例类型的更多信息,请参阅以下内容:

在添加具有 GPU 支持的专用代理节点之前,请确保检查硬件要求

在计算机上安装 GPU 驱动程序

备注:
  • 以下说明适用于在线和离线 Automation Suite 安装。对于离线安装,您必须确保临时互联网访问,以检索所需的 GPU 驱动程序依赖项。如果在安装 GPU 驱动程序时遇到问题,请联系 NVIDIA 支持团队。

  • GPU 驱动程序存储在 /opt/nvidia/usr 文件夹下。强烈建议在 GPU 代理计算机上,这些文件夹的大小应分别至少为5 GiB和15 GiB
  1. 要在代理节点上安装 GPU 驱动程序,请运行以下命令:
    sudo yum install kernel kernel-tools kernel-headers kernel-devel 
    sudo reboot
    sudo yum install https://dl.fedoraproject.org/pub/epel/epel-release-latest-8.noarch.rpm
    sudo sed 's/$releasever/8/g' -i /etc/yum.repos.d/epel.repo
    sudo sed 's/$releasever/8/g' -i /etc/yum.repos.d/epel-modular.repo
    sudo yum config-manager --add-repo http://developer.download.nvidia.com/compute/cuda/repos/rhel8/x86_64/cuda-rhel8.repo
    sudo yum install cudasudo yum install kernel kernel-tools kernel-headers kernel-devel 
    sudo reboot
    sudo yum install https://dl.fedoraproject.org/pub/epel/epel-release-latest-8.noarch.rpm
    sudo sed 's/$releasever/8/g' -i /etc/yum.repos.d/epel.repo
    sudo sed 's/$releasever/8/g' -i /etc/yum.repos.d/epel-modular.repo
    sudo yum config-manager --add-repo http://developer.download.nvidia.com/compute/cuda/repos/rhel8/x86_64/cuda-rhel8.repo
    sudo yum install cuda
  2. 要安装容器工具包,请运行以下命令:
    curl -s -L https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo | \
            sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo
            sudo yum-config-manager --enable nvidia-container-toolkit-experimental
            sudo yum install -y nvidia-container-toolkitcurl -s -L https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo | \
            sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo
            sudo yum-config-manager --enable nvidia-container-toolkit-experimental
            sudo yum install -y nvidia-container-toolkit

验证驱动程序是否已正确安装

在节点上运行 sudo nvidia-smi 命令以验证驱动程序是否已正确安装。


注意:配置集群后,需要执行其他步骤来配置已配置的 GPU。

此时,GPU 驱动程序已安装,并且 GPU 节点已添加到集群中。

向集群中添加 GPU 节点

步骤 1:配置计算机

请按照以下步骤配置计算机,以确保磁盘分区正确且满足所有网络要求。

步骤 2:将交互式安装程序复制到目标计算机

在线安装

  1. 通过 SSH 连接到任何服务器计算机。
  2. 运行以下命令,将 UiPathAutomationSuite 文件夹的内容复制到 GPU 节点(使用特定于 GPU 节点的用户名和 DNS):
    sudo su -
    scp -r /opt/UiPathAutomationSuite <username>@<node dns>:/opt/
    scp -r ~/* <username>@<node dns>:/opt/UiPathAutomationSuite/sudo su -
    scp -r /opt/UiPathAutomationSuite <username>@<node dns>:/opt/
    scp -r ~/* <username>@<node dns>:/opt/UiPathAutomationSuite/

离线安装

  1. 通过 SSH 连接到任何服务器节点。
  2. 确保 /opt/UiPathAutomationSuite 目录包含 sf-infra.tar.gz 文件(在安装包下载步骤中所有提及)
    scp -r ~/opt/UiPathAutomationSuite <username>@<node dns>:/var/tmpscp -r ~/opt/UiPathAutomationSuite <username>@<node dns>:/var/tmp

步骤 3:运行交互式安装向导以配置专用节点

在线安装

  1. 通过 SSH 连接到 GPU 节点
  2. 运行以下命令:
    sudo su -
    cd /opt/UiPathAutomationSuite
    chmod -R 755 /opt/UiPathAutomationSuite
    yum install unzip jq -y
    CONFIG_PATH=/opt/UiPathAutomationSuite/cluster_config.json 
    
    UNATTENDED_ACTION="accept_eula,download_bundle,extract_bundle,join_gpu" ./installUiPathAS.shsudo su -
    cd /opt/UiPathAutomationSuite
    chmod -R 755 /opt/UiPathAutomationSuite
    yum install unzip jq -y
    CONFIG_PATH=/opt/UiPathAutomationSuite/cluster_config.json 
    
    UNATTENDED_ACTION="accept_eula,download_bundle,extract_bundle,join_gpu" ./installUiPathAS.sh

离线安装

  1. 通过 SSH 连接到 GPU 专用节点。
  2. 使用以下脚本在 GPU 专用节点上安装平台捆绑包:
    sudo su 
    mv /var/tmp/UiPathAutomationSuite /opt
    cd /opt/UiPathAutomationSuite
    chmod -R 755 /opt/UiPathAutomationSuite
    
    ./install-uipath.sh -i ./cluster_config.json -o ./output.json -k -j gpu --offline-bundle ./sf-infra.tar.gz --offline-tmp-folder /opt/UiPathAutomationSuite/tmp --install-offline-prereqs --accept-license-agreementsudo su 
    mv /var/tmp/UiPathAutomationSuite /opt
    cd /opt/UiPathAutomationSuite
    chmod -R 755 /opt/UiPathAutomationSuite
    
    ./install-uipath.sh -i ./cluster_config.json -o ./output.json -k -j gpu --offline-bundle ./sf-infra.tar.gz --offline-tmp-folder /opt/UiPathAutomationSuite/tmp --install-offline-prereqs --accept-license-agreement

在集群上配置 GPU 驱动程序

步骤 1:在集群上安装 GPU 驱动程序

  1. 确保通过 SSH 连接到 GPU 计算机。
  2. 通过运行以下命令更新 GPU 节点的 contianerd 配置:
    cat <<EOF > gpu_containerd.sh
    if ! nvidia-smi &>/dev/null;
    then
      echo "GPU Drivers are not installed on the VM. Please refer the documentation."
      exit 0
    fi
    if ! which nvidia-container-runtime &>/dev/null;
    then
      echo "Nvidia container runtime is not installed on the VM. Please refer the documentation."
      exit 0 
    fi
    grep "nvidia-container-runtime" /var/lib/rancher/rke2/agent/etc/containerd/config.toml &>/dev/null && info "GPU containerd changes already applied" && exit 0
    awk '1;/plugins.cri.containerd]/{print "  default_runtime_name = \"nvidia-container-runtime\""}' /var/lib/rancher/rke2/agent/etc/containerd/config.toml > /var/lib/rancher/rke2/agent/etc/containerd/config.toml.tmpl
    echo -e '\n[plugins.linux]\n  runtime = "nvidia-container-runtime"' >> /var/lib/rancher/rke2/agent/etc/containerd/config.toml.tmpl
    echo -e '\n[plugins.cri.containerd.runtimes.nvidia-container-runtime]\n  runtime_type = "io.containerd.runc.v2"\n  [plugins.cri.containerd.runtimes.nvidia-container-runtime.options]\n    BinaryName = "nvidia-container-runtime"' >> /var/lib/rancher/rke2/agent/etc/containerd/config.toml.tmpl
    EOFcat <<EOF > gpu_containerd.sh
    if ! nvidia-smi &>/dev/null;
    then
      echo "GPU Drivers are not installed on the VM. Please refer the documentation."
      exit 0
    fi
    if ! which nvidia-container-runtime &>/dev/null;
    then
      echo "Nvidia container runtime is not installed on the VM. Please refer the documentation."
      exit 0 
    fi
    grep "nvidia-container-runtime" /var/lib/rancher/rke2/agent/etc/containerd/config.toml &>/dev/null && info "GPU containerd changes already applied" && exit 0
    awk '1;/plugins.cri.containerd]/{print "  default_runtime_name = \"nvidia-container-runtime\""}' /var/lib/rancher/rke2/agent/etc/containerd/config.toml > /var/lib/rancher/rke2/agent/etc/containerd/config.toml.tmpl
    echo -e '\n[plugins.linux]\n  runtime = "nvidia-container-runtime"' >> /var/lib/rancher/rke2/agent/etc/containerd/config.toml.tmpl
    echo -e '\n[plugins.cri.containerd.runtimes.nvidia-container-runtime]\n  runtime_type = "io.containerd.runc.v2"\n  [plugins.cri.containerd.runtimes.nvidia-container-runtime.options]\n    BinaryName = "nvidia-container-runtime"' >> /var/lib/rancher/rke2/agent/etc/containerd/config.toml.tmpl
    EOF
    sudo bash gpu_containerd.shsudo bash gpu_containerd.sh
  3. 通过运行以下命令重新启动 rke2-agent
    systemctl restart rke2-agentsystemctl restart rke2-agent

步骤 2:在集群中启用 GPU

  1. 在任意服务器节点上运行以下命令。
  2. 导航到 UiPathAutomationSuite 文件夹。
    cd /opt/UiPathAutomationSuitecd /opt/UiPathAutomationSuite

在在线安装中启用 GPU

DOCKER_REGISTRY_URL=$(cat defaults.json | jq -er ".registries.docker.url")
sed -i "s/REGISTRY_PLACEHOLDER/${DOCKER_REGISTRY_URL}/g" ./Infra_Installer/gpu_plugin/nvidia-device-plugin.yaml
kubectl apply -f ./Infra_Installer/gpu_plugin/nvidia-device-plugin.yaml
kubectl -n kube-system rollout restart daemonset nvidia-device-plugin-daemonsetDOCKER_REGISTRY_URL=$(cat defaults.json | jq -er ".registries.docker.url")
sed -i "s/REGISTRY_PLACEHOLDER/${DOCKER_REGISTRY_URL}/g" ./Infra_Installer/gpu_plugin/nvidia-device-plugin.yaml
kubectl apply -f ./Infra_Installer/gpu_plugin/nvidia-device-plugin.yaml
kubectl -n kube-system rollout restart daemonset nvidia-device-plugin-daemonset

在离线安装中启用 GPU

DOCKER_REGISTRY_URL=localhost:30071
sed -i "s/REGISTRY_PLACEHOLDER/${DOCKER_REGISTRY_URL}/g" ./Infra_Installer/gpu_plugin/nvidia-device-plugin.yaml
kubectl apply -f ./Infra_Installer/gpu_plugin/nvidia-device-plugin.yaml
kubectl -n kube-system rollout restart daemonset nvidia-device-plugin-daemonsetDOCKER_REGISTRY_URL=localhost:30071
sed -i "s/REGISTRY_PLACEHOLDER/${DOCKER_REGISTRY_URL}/g" ./Infra_Installer/gpu_plugin/nvidia-device-plugin.yaml
kubectl apply -f ./Infra_Installer/gpu_plugin/nvidia-device-plugin.yaml
kubectl -n kube-system rollout restart daemonset nvidia-device-plugin-daemonset

此页面有帮助吗?

获取您需要的帮助
了解 RPA - 自动化课程
UiPath Community 论坛
Uipath Logo White
信任与安全
© 2005-2024 UiPath。保留所有权利。