automation-suite
2021.10
false
- Overview
- Requirements
- Installation
- Post-installation
- Cluster administration
- Monitoring and alerting
- Migration and upgrade
- Product-specific configuration
- Best practices and maintenance
- Troubleshooting
- How to Troubleshoot Services During Installation
- How to Uninstall the Cluster
- How to clean up offline artifacts to improve disk space
- How to disable TLS 1.0 and 1.1
- How to enable Istio logging
- How to manually clean up logs
- How to clean up old logs stored in the sf-logs bundle
- How to debug failed Automation Suite installations
- How to disable TX checksum offloading
- Unable to run an offline installation on RHEL 8.4 OS
- Error in Downloading the Bundle
- Offline installation fails because of missing binary
- Certificate issue in offline installation
- SQL connection string validation error
- Failure After Certificate Update
- Automation Suite Requires Backlog_wait_time to Be Set 1
- Cannot Log in After Migration
- Setting a timeout interval for the management portals
- Update the underlying directory connections
- Kinit: Cannot Find KDC for Realm <AD Domain> While Getting Initial Credentials
- Kinit: Keytab Contains No Suitable Keys for *** While Getting Initial Credentials
- GSSAPI Operation Failed With Error: An Invalid Status Code Was Supplied (Client's Credentials Have Been Revoked).
- Login Failed for User <ADDOMAIN><aduser>. Reason: The Account Is Disabled.
- Alarm Received for Failed Kerberos-tgt-update Job
- SSPI Provider: Server Not Found in Kerberos Database
- Failure to get the sandbox image
- Pods not showing in ArgoCD UI
- Redis Probe Failure
- RKE2 Server Fails to Start
- Secret Not Found in UiPath Namespace
- ArgoCD goes into progressing state after first installation
- Unexpected Inconsistency; Run Fsck Manually
- Missing Self-heal-operator and Sf-k8-utils Repo
- Degraded MongoDB or Business Applications After Cluster Restore
- Unhealthy Services After Cluster Restore or Rollback
- Using the Automation Suite Diagnostics Tool
- Using the Automation Suite Support Bundle Tool
- Exploring Logs
Missing Self-heal-operator and Sf-k8-utils Repo
Automation Suite Installation Guide
Last updated Aug 26, 2024
Missing Self-heal-operator and Sf-k8-utils Repo
This issue causes workloads to go into
ImagePullBackOff
or ErrImagePull
state with the following error:
Failed to pull image "sf-k8-utils-rhel:<tag>": rpc error: code = Unknown desc = failed to pull and unpack image "docker.io/library/sf-k8-utils-rhel:<tag>": failed to resolve reference "docker.io/library/sf-k8-utils-rhel:<tag>": failed to do request: Head "https://localhost:30071/v2/library/sf-k8-utils-rhel/manifests/<tag>?ns=docker.io": dial tcp [::1]:30071: connect: connection refused
OR
Failed to pull image "self-heal-operator:<tag>": rpc error: code = Unknown desc = failed to pull and unpack image "docker.io/library/self-heal-operator:<tag>": failed to resolve reference "docker.io/library/self-heal-operator:<tag>": failed to do request: Head "https://localhost:30071/v2/library/self-heal-operator/manifests/<tag>?ns=docker.io": dial tcp [::1]:30071: connect: connection refused
Failed to pull image "sf-k8-utils-rhel:<tag>": rpc error: code = Unknown desc = failed to pull and unpack image "docker.io/library/sf-k8-utils-rhel:<tag>": failed to resolve reference "docker.io/library/sf-k8-utils-rhel:<tag>": failed to do request: Head "https://localhost:30071/v2/library/sf-k8-utils-rhel/manifests/<tag>?ns=docker.io": dial tcp [::1]:30071: connect: connection refused
OR
Failed to pull image "self-heal-operator:<tag>": rpc error: code = Unknown desc = failed to pull and unpack image "docker.io/library/self-heal-operator:<tag>": failed to resolve reference "docker.io/library/self-heal-operator:<tag>": failed to do request: Head "https://localhost:30071/v2/library/self-heal-operator/manifests/<tag>?ns=docker.io": dial tcp [::1]:30071: connect: connection refused
To fix this issue, run the following script from all the nodes in the cluster one by one.
#!/bin/bash
export KUBECONFIG=${KUBECONFIG:-/etc/rancher/rke2/rke2.yaml}
export PATH=$PATH:/var/lib/rancher/rke2/bin:${SCRIPT_DIR}/Fabric_Installer/bin:/usr/local/bin
function get_docker_registry_url() {
local rancher_registries_file="/etc/rancher/rke2/registries.yaml"
config=$(cat < ${rancher_registries_file} | grep -A1 "configs:"|tail -n1| awk '{print $0}'|tr -d ' '|tr -d '"')
url="${config::-1}"
echo "${url}"
}
function get_docker_registry_credentials() {
local key="$1"
local rancher_registries_file="/etc/rancher/rke2/registries.yaml"
value=$(cat < ${rancher_registries_file} | grep "$key:" | cut -d: -f2 | xargs)
echo "${value}"
}
function get_cluster_config() {
local key=$1
# the go template if prevents it from printing <no-value> instead of empty strings
value=$(kubectl get secret service-cluster-configurations \
-o "go-template={{if index .data \"${key^^}\"}}{{index .data \"${key^^}\"}}{{end}}" \
-n uipath-infra --ignore-not-found) || true
echo -n "$(base64 -d <<<"$value")"
}
function update_image_tag() {
username=$(get_docker_registry_credentials username)
password=$(get_docker_registry_credentials password)
url=$(get_docker_registry_url)
images=(self-heal-operator sf-k8-utils-rhel)
for image in ${images[@]}; do
echo "Start checking available $image tag"
tag=$(curl -u $username:$password -X GET https://${url}/v2/$image/tags/list -k -q -s | jq -rc .tags[0] )
if [[ "${tag}" != "null" ]]; then
echo "$image with tag ${tag} found..."
podman login ${url} --username $username --password $password --tls-verify=false
podman pull ${url}/${image}:${tag} --tls-verify=false
podman tag ${url}/${image}:${tag} ${url}/uipath/${image}:${tag}
podman tag ${url}/${image}:${tag} ${url}/library/${image}:${tag}
podman push ${url}/uipath/${image}:${tag} --tls-verify=false
podman push ${url}/library/${image}:${tag} --tls-verify=false
echo "$image is retag and push to docker registry"
else
echo "no tag available for $image"
fi
done
}
function validate_rke2_registry_config() {
local rancher_registries_file="/etc/rancher/rke2/registries.yaml"
local endpoint_present="false"
endpoint=$(cat < ${rancher_registries_file} | grep -A2 "docker.io:" | grep -A1 "endpoint:"|tail -n1|xargs)
if [[ -n "${endpoint}" ]]; then
endpoint_present="true"
fi
echo "${endpoint_present}"
}
function update_rke2_registry_config() {
local DOCKER_REGISTRY_URL=$(get_docker_registry_url)
local DOCKER_REGISTRY_LOCAL_USERNAME=$(get_docker_registry_credentials username)
local DOCKER_REGISTRY_LOCAL_PASSWORD=$(get_docker_registry_credentials password)
local registriesPath="/etc/rancher/rke2/registries.yaml"
local DOCKER_REGISTRY_NODEPORT=30071
echo "Create temp file with name ${registriesPath}_tmp"
cp -r ${registriesPath} ${registriesPath}_tmp
echo "Start updating ${registriesPath}"
cat > "${registriesPath}" <<EOF
mirrors:
docker-registry.docker-registry.svc.cluster.local:5000:
endpoint:
- "https://${DOCKER_REGISTRY_URL}"
docker.io:
endpoint:
- "https://${DOCKER_REGISTRY_URL}"
${DOCKER_REGISTRY_URL}:
endpoint:
- "https://${DOCKER_REGISTRY_URL}"
configs:
"localhost:${DOCKER_REGISTRY_NODEPORT}":
tls:
insecure_skip_verify: true
auth:
username: ${DOCKER_REGISTRY_LOCAL_USERNAME}
password: ${DOCKER_REGISTRY_LOCAL_PASSWORD}
EOF
}
function is_server_node() {
[[ "$(systemctl is-enabled rke2-server 2>>/dev/null)" == "enabled" ]] && echo -n "true" && return
echo "false"
}
function main() {
local is_server_node=$(is_server_node)
local install_type=$(get_cluster_config "INSTALL_TYPE")
if [[ "${install_type}" != "offline" ]]; then
echo "This script is compatible with only offline cluster. Current cluster install_type=${install_type}"
exit 0
fi
if [[ "${is_server_node}" == "true" ]]; then
echo "current node is identified as server node. Updating image tag"
update_image_tag
else
echo "current node is identified as agent node."
fi
rke2_registry_config_is_valid=$(validate_rke2_registry_config)
if [[ "${rke2_registry_config_is_valid}" == "false" ]]; then
echo "start updating rke2 config"
update_rke2_registry_config
if [[ "${is_server_node}" == "true" ]]; then
echo "Registry configuration is updated. Restarting service using command: systemctl restart rke2-server"
systemctl restart rke2-server.service
else
echo "Registry configuration is updated. Restarting service using command: systemctl restart rke2-agent"
systemctl restart rke2-agent.service
fi
else
echo "rke2 config update is not required"
fi
}
main
#!/bin/bash
export KUBECONFIG=${KUBECONFIG:-/etc/rancher/rke2/rke2.yaml}
export PATH=$PATH:/var/lib/rancher/rke2/bin:${SCRIPT_DIR}/Fabric_Installer/bin:/usr/local/bin
function get_docker_registry_url() {
local rancher_registries_file="/etc/rancher/rke2/registries.yaml"
config=$(cat < ${rancher_registries_file} | grep -A1 "configs:"|tail -n1| awk '{print $0}'|tr -d ' '|tr -d '"')
url="${config::-1}"
echo "${url}"
}
function get_docker_registry_credentials() {
local key="$1"
local rancher_registries_file="/etc/rancher/rke2/registries.yaml"
value=$(cat < ${rancher_registries_file} | grep "$key:" | cut -d: -f2 | xargs)
echo "${value}"
}
function get_cluster_config() {
local key=$1
# the go template if prevents it from printing <no-value> instead of empty strings
value=$(kubectl get secret service-cluster-configurations \
-o "go-template={{if index .data \"${key^^}\"}}{{index .data \"${key^^}\"}}{{end}}" \
-n uipath-infra --ignore-not-found) || true
echo -n "$(base64 -d <<<"$value")"
}
function update_image_tag() {
username=$(get_docker_registry_credentials username)
password=$(get_docker_registry_credentials password)
url=$(get_docker_registry_url)
images=(self-heal-operator sf-k8-utils-rhel)
for image in ${images[@]}; do
echo "Start checking available $image tag"
tag=$(curl -u $username:$password -X GET https://${url}/v2/$image/tags/list -k -q -s | jq -rc .tags[0] )
if [[ "${tag}" != "null" ]]; then
echo "$image with tag ${tag} found..."
podman login ${url} --username $username --password $password --tls-verify=false
podman pull ${url}/${image}:${tag} --tls-verify=false
podman tag ${url}/${image}:${tag} ${url}/uipath/${image}:${tag}
podman tag ${url}/${image}:${tag} ${url}/library/${image}:${tag}
podman push ${url}/uipath/${image}:${tag} --tls-verify=false
podman push ${url}/library/${image}:${tag} --tls-verify=false
echo "$image is retag and push to docker registry"
else
echo "no tag available for $image"
fi
done
}
function validate_rke2_registry_config() {
local rancher_registries_file="/etc/rancher/rke2/registries.yaml"
local endpoint_present="false"
endpoint=$(cat < ${rancher_registries_file} | grep -A2 "docker.io:" | grep -A1 "endpoint:"|tail -n1|xargs)
if [[ -n "${endpoint}" ]]; then
endpoint_present="true"
fi
echo "${endpoint_present}"
}
function update_rke2_registry_config() {
local DOCKER_REGISTRY_URL=$(get_docker_registry_url)
local DOCKER_REGISTRY_LOCAL_USERNAME=$(get_docker_registry_credentials username)
local DOCKER_REGISTRY_LOCAL_PASSWORD=$(get_docker_registry_credentials password)
local registriesPath="/etc/rancher/rke2/registries.yaml"
local DOCKER_REGISTRY_NODEPORT=30071
echo "Create temp file with name ${registriesPath}_tmp"
cp -r ${registriesPath} ${registriesPath}_tmp
echo "Start updating ${registriesPath}"
cat > "${registriesPath}" <<EOF
mirrors:
docker-registry.docker-registry.svc.cluster.local:5000:
endpoint:
- "https://${DOCKER_REGISTRY_URL}"
docker.io:
endpoint:
- "https://${DOCKER_REGISTRY_URL}"
${DOCKER_REGISTRY_URL}:
endpoint:
- "https://${DOCKER_REGISTRY_URL}"
configs:
"localhost:${DOCKER_REGISTRY_NODEPORT}":
tls:
insecure_skip_verify: true
auth:
username: ${DOCKER_REGISTRY_LOCAL_USERNAME}
password: ${DOCKER_REGISTRY_LOCAL_PASSWORD}
EOF
}
function is_server_node() {
[[ "$(systemctl is-enabled rke2-server 2>>/dev/null)" == "enabled" ]] && echo -n "true" && return
echo "false"
}
function main() {
local is_server_node=$(is_server_node)
local install_type=$(get_cluster_config "INSTALL_TYPE")
if [[ "${install_type}" != "offline" ]]; then
echo "This script is compatible with only offline cluster. Current cluster install_type=${install_type}"
exit 0
fi
if [[ "${is_server_node}" == "true" ]]; then
echo "current node is identified as server node. Updating image tag"
update_image_tag
else
echo "current node is identified as agent node."
fi
rke2_registry_config_is_valid=$(validate_rke2_registry_config)
if [[ "${rke2_registry_config_is_valid}" == "false" ]]; then
echo "start updating rke2 config"
update_rke2_registry_config
if [[ "${is_server_node}" == "true" ]]; then
echo "Registry configuration is updated. Restarting service using command: systemctl restart rke2-server"
systemctl restart rke2-server.service
else
echo "Registry configuration is updated. Restarting service using command: systemctl restart rke2-agent"
systemctl restart rke2-agent.service
fi
else
echo "rke2 config update is not required"
fi
}
main
Note:
The
fix_image_project_id.sh
script restarts the Kubernetes server (rke2 service) and all the workloads running on the nodes.
Running the
fix_image_project_id.sh
script is required only if you use Automation Suite 2021.10.0, 2021.10.1, 2021.10.2, 2021.10.3, or 2021.10.4.