Subscribe

UiPath AI Center

UiPath AI Center

4. Run the AI Center Infrastructure Installer

Run the AI Center infrastructure installer. Completing this installer will produce the Kots admin console where you are able to manage application updates, application configuration, resource usage (CPU/mem pressure), and download support bundles to troubleshoot any issues.

❗️

Caution

Do not kill the process or shutdown the machine will this step is running. This step will complete in 15-25 minutes. If you accidentally cause the process to terminate in the middle, the machine may need to be re-provisioned (shouldn't be the case in most cases) and brand new disks need to be attached.



Online Installation

First step is to download installer zip file here and move it to AI Center server. Alternatively, you can download it directly from the machine using following command

🚧

Make sure you have enough space

The script will download some files locally as part of installation process, please make sure you have 4GB available on the directory where you are executing script.
By default Azure RHEL VMs have only 1 GB available on home directory which is default directory.

wget https://download.uipath.com/aicenter/online-installer/v2021.4.0/aicenter-installer-v21.4.0.tar.gz

Then untarthe file and go inside main folder using following command:

tar -xvf aicenter-installer-v21.4.0.tar.gz
cd ./aicenter-installer-v21.4.0

You can then run AI Center installer by running:

./setup.sh

📘

Some commands require sudo permission so if you are not already logged as root, script may ask you to enter root password.

The first step is to accept license agreement by pressing Y. The script will then ask you what type of platform you want to install, enter onebox and press enter as on image below:

You will then be asked if a GPU is available for your setup and Y or N depending of your hardware. Make sure that drivers are already installed.

Few preflight checks will be run before the script actually start. These checks will verify that your machine has the correct configuration they are not foolproof but should be a good indicator. In particular, if you are using Ubuntu, you may see the following line:
[WARN] Host UFW status: UFW is active

UFW is a firewall which is installed by default on recent Ubuntu OS, if you are using the default config there shouldn't be any issue if you are using specific rules this can interfere with Kubernetes behavior. If this message is displayed, you can continue by pressing Y.
In a similar way, firewall and SELinux are not compatible with Kubernetes. If you receive a messages as in the picture below, proceed by pressing Y.

🚧

Using GPU

Only NVIDIA GPU's are supported and drivers need to be installed prior AI Center installation.

Depending of your system you might be asked to press Y few times for the installation to complete.

This step will take between 15-25 minutes to complete. Upon completion, you will see on the terminal output a message Installation Complete.

Airgapped Installation

On local machine with access to a browser (e.g. a Windows server) download bundle install using link provided by your account manager.

Before extracting downloaded files, make sure you have at least 35 GB available for the extracted content.
Extract the contents of the downloaded file using 7zip from a windows file explorer or tar -zxvf aicenter-installer-v21.4.0.tar.gz from a machine that supports tar.

This will create two files:

  • aicenter-airgap-infra-v21.4.0.tar.gz containing infrastructure components (about 3.6 GB)
  • aicenter-v21.4.0.airgap containing application components (about 8.7 GB). This will be uploaded to the UI in step 5. Run the AI Center Application Installer.

Copy aicenter-airgap-infra-v21.4.0.tar.gzto the airgapped AI Center machine.

Then run following command to start infrastructure installer:

tar -zxvf aicenter-airgap-infra-v21.4.0.tar.gz
cd aicenter-airgap-infra-v21.4.0
sudo ./setup.sh

Admin Console Access

In both cases, the successful installation has the address and password of KotsAdmin UI as output.

...
Install Successful:
configmap/kurl-config created
                Installation
                  Complete ✔

Kotsadm: http://13.59.108.17:8800
Login with password (will not be shown again): NNqKCY82S

The UIs of Prometheus, Grafana and Alertmanager have been exposed on NodePorts 30900, 
30902 and 30903 respectively.
To access Grafana use the generated user:password of admin:msDX5VZ9m .

To access the cluster with kubectl, reload your shell:
    bash -l
    
...

Note that the address of the kotsadm UI is on <machine-ip>:8800. In some cases, the internal IP address may be displayed instead of the public one. Make sure you are using the public IP if you are accessing it from outside.
Using the line below, the login password is displayed. Make a note of this password. You can regenerate this password if it is lost or if you would like to reset it:

bash -l
kubectl kots reset-password -n default

Adding GPU after install

If GPU wasn't available during installation but is later added to the machine you need to complete the following steps to make it accessible to AI Fabric.
First, validate that GPU's drivers are correctly installed (prerequisite) running the command:

nvidia-smi

You should see GPU information, if you see an error it means that GPU is not accessible or drivers are not correctly installed, please fix that before proceeding.

Then you need to run a script for adding the GPU to the cluster so that Pipelines and ML Skills can consume it.
For online install do the following:

# navigate to where you untar installer (or redo it if you have removed it)
cd ./aicenter-installer-v21.4.0/infra/common/scripts
./attach_gpu_drivers.sh

For airgapped this file won't be present so you need to create it first, navigate to aif_infra directory and create it there (it is important that nvidia-device-plugin.yaml is located on same folder):

#!/bin/bash

function edit_daemon_json(){
    echo "################## Updating docker configuration ######################"
    sudo bash -c '
echo \
"{
    \"default-runtime\": \"nvidia\",
    \"exec-opts\": [\"native.cgroupdriver=systemd\"],
    \"runtimes\": {
        \"nvidia\": {
            \"path\": \"/usr/bin/nvidia-container-runtime\",
            \"runtimeArgs\": []
        }
    }
}" > /etc/docker/daemon.json'

}

function kubernetes_cluster_up() {
    count=0
    swapoff -a
    while [ $count -lt 50 ]; do
        sudo chmod +r /etc/kubernetes/admin.conf
        export KUBECONFIG=/etc/kubernetes/admin.conf
        result=$(kubectl get nodes| grep master)
        if [[ "$result" == *"master"* ]]; then
            echo "Kubernetes up after " $((count * 5)) "seconds"
            break
        else
            echo "Kubernetes not up, retry : " $count
            count=$(( $count + 1 ))
            sleep 5
        fi
    done

    if [ $count == 50 ]; then
        echo "Kubernetes Failed to come up"
        exit
    fi
}

function validate_gpu_updated() {
    count=0
    while [ $count -lt 50 ]; do
        result=$(kubectl describe nodes| grep nvidia.com/gpu)
        if [[ "$result" == *"nvidia.com/gpu"* ]]; then
            echo $result
            echo "Node gpu info updated after " $((count * 5)) "seconds"
            echo "##################### Successfully installed GPU #########################"
            break
        else
            echo "kubectl gpu info not updated, retry : " $count
            count=$(( $count + 1 ))
            sleep 5
        fi
    done

    if [ $count == 50 ]; then
        echo "################## Failed to install gpu ####################"
        swapoff -a
        exit
    fi
}

function restart_docker() {
    sudo pkill -SIGHUP dockerd
    sudo systemctl restart docker
    count=0
    while [ $count -lt 50 ]; do
        result=$(sudo systemctl status docker| grep running)
        if [[ "$result" == *"running"* ]]; then
            echo "docker is up " $((count * 5)) "seconds"
            break
        else
            echo "docker is not up, retry : " $count
            count=$(( $count + 1 ))
            sleep 5
        fi
    done

    if [ $count == 50 ]; then
        echo "Docker Failed to come up"
        swapoff -a
        exit
    fi
}

echo "################################# Attach Gpu Driver #####################################"
edit_daemon_json
echo "################################# restarting docker #####################################"
restart_docker
sleep 2
# This is required because when kubeadm init start kubelet, its check docker cgroup driver and uses cgroupfs
# in case of discrepancy. So we have to change this driver and restart kubelet again
echo "################################# restarting kubelet #####################################"
sudo sed -i 's/cgroup-driver=cgroupfs/cgroup-driver=systemd/' /var/lib/kubelet/kubeadm-flags.env
sudo systemctl restart kubelet
sleep 10
kubernetes_cluster_up
kubectl apply -f nvidia-device-plugin.yaml
validate_gpu_updated

then just execute that script:

./attach_gpu_drivers.sh

At the end you should see installation as successful

Troubleshooting

The infrastructure installer is not idempotent. This means that running the installer again (after you have already run it once) will not work. If this installer fails, you will need to reprovision a new machine with fresh disks.

The most common sources of error are that the bootdisk becomes full during the install or that the external data disks are mounted/formatted. Remember to only attach the disks, not format them.

If the installation fails with unformatted disks and a sufficiently large boot risk, contact our support team and include in your email a support bundle. A support bundle can be generated by running this command:

curl https://krew.sh/support-bundle | bash
kubectl support-bundle https://kots.io

Alternatively if you don't have access to the internet you can create file support-bundle.yaml with following text:

apiVersion: troubleshoot.replicated.com/v1beta1
kind: Collector
metadata:
  name: collector-sample
spec:
  collectors:
    - clusterInfo: {}
    - clusterResources: {}
    - exec:
        args:
          - "-U"
          - kotsadm
        collectorName: kotsadm-postgres-db
        command:
          - pg_dump
        containerName: kotsadm-postgres
        name: kots/admin_console
        selector:
          - app=kotsadm-postgres
        timeout: 10s
    - logs:
        collectorName: kotsadm-postgres-db
        name: kots/admin_console
        selector:
          - app=kotsadm-postgres
    - logs:
        collectorName: kotsadm-api
        name: kots/admin_console
        selector:
          - app=kotsadm-api
    - logs:
        collectorName: kotsadm-operator
        name: kots/admin_console
        selector:
          - app=kotsadm-operator
    - logs:
        collectorName: kotsadm
        name: kots/admin_console
        selector:
          - app=kotsadm
    - logs:
        collectorName: kurl-proxy-kotsadm
        name: kots/admin_console
        selector:
          - app=kurl-proxy-kotsadm
    - secret:
        collectorName: kotsadm-replicated-registry
        includeValue: false
        key: .dockerconfigjson
        name: kotsadm-replicated-registry
    - logs:
        collectorName: rook-ceph-agent
        selector:
          - app=rook-ceph-agent
        namespace: rook-ceph
        name: kots/rook
    - logs:
        collectorName: rook-ceph-mgr
        selector:
          - app=rook-ceph-mgr
        namespace: rook-ceph
        name: kots/rook
- logs:
        collectorName: rook-ceph-mon
        selector:
          - app=rook-ceph-mon
        namespace: rook-ceph
        name: kots/rook
    - logs:
        collectorName: rook-ceph-operator
        selector:
          - app=rook-ceph-operator
        namespace: rook-ceph
        name: kots/rook
    - logs:
        collectorName: rook-ceph-osd
        selector:
          - app=rook-ceph-osd
        namespace: rook-ceph
        name: kots/rook
    - logs:
        collectorName: rook-ceph-osd-prepare
        selector:
          - app=rook-ceph-osd-prepare
        namespace: rook-ceph
        name: kots/rook
    - logs:
        collectorName: rook-ceph-rgw
        selector:
          - app=rook-ceph-rgw
        namespace: rook-ceph
        name: kots/rook
    - logs:
        collectorName: rook-discover
        selector:
          - app=rook-discover
        namespace: rook-ceph
        name: kots/rook

And then create support-bundle file using following command:

kubectl support-bundle support-bundle.yaml

This will create a file called supportbundle.tar.gz which you can upload when raising a support ticket.

Updated 3 months ago


4. Run the AI Center Infrastructure Installer


Suggested Edits are limited on API Reference Pages

You can only suggest edits to Markdown body content, but not to the API spec.