Automation Suite
2022.4
False
- 概述
- 要求
- 安装
- 安装后
- 集群管理
- 监控和警示
- 迁移和升级
- 特定于产品的配置
- 最佳实践和维护
- 故障排除
- 无法获取沙盒映像
- Pod 未显示在 ArgoCD 用户界面中
- Redis 探测器失败
- RKE2 服务器无法启动
- 在 UiPath 命名空间中找不到密码
- 初始安装后,ArgoCD 应用程序进入“进行中”状态
- MongoDB Pod 处于 CrashLoopBackOff 状态或在删除后处于“等待 PVC 配置”状态
- 意外错误:不一致;手动运行 fsck
- 集群还原后 MongoDB 或业务应用程序降级
- 缺少 Self-heal-operator 和 Sf-k8-utils 存储库
- 集群还原或回滚后服务不正常
- RabbitMQ Pod 卡在 CrashLoopBackOff 中
- Prometheus 处于 CrashLoopBackoff 状态,并出现内存不足 (OOM) 错误
- 监控仪表板中缺少 Ceph-rook 指标
- 使用 Automation Suite 诊断工具
- 使用 Automation Suite 支持包工具
- 探索日志
缺少 Self-heal-operator 和 Sf-k8-utils 存储库
Automation Suite 安装指南
上次更新日期 2024年4月24日
缺少 Self-heal-operator 和 Sf-k8-utils 存储库
此问题会导致工作负载进入
ImagePullBackOff
或 ErrImagePull
状态,并出现以下错误:
Failed to pull image "sf-k8-utils-rhel:<tag>": rpc error: code = Unknown desc = failed to pull and unpack image "docker.io/library/sf-k8-utils-rhel:<tag>": failed to resolve reference "docker.io/library/sf-k8-utils-rhel:<tag>": failed to do request: Head "https://localhost:30071/v2/library/sf-k8-utils-rhel/manifests/<tag>?ns=docker.io": dial tcp [::1]:30071: connect: connection refused
OR
Failed to pull image "self-heal-operator:<tag>": rpc error: code = Unknown desc = failed to pull and unpack image "docker.io/library/self-heal-operator:<tag>": failed to resolve reference "docker.io/library/self-heal-operator:<tag>": failed to do request: Head "https://localhost:30071/v2/library/self-heal-operator/manifests/<tag>?ns=docker.io": dial tcp [::1]:30071: connect: connection refused
Failed to pull image "sf-k8-utils-rhel:<tag>": rpc error: code = Unknown desc = failed to pull and unpack image "docker.io/library/sf-k8-utils-rhel:<tag>": failed to resolve reference "docker.io/library/sf-k8-utils-rhel:<tag>": failed to do request: Head "https://localhost:30071/v2/library/sf-k8-utils-rhel/manifests/<tag>?ns=docker.io": dial tcp [::1]:30071: connect: connection refused
OR
Failed to pull image "self-heal-operator:<tag>": rpc error: code = Unknown desc = failed to pull and unpack image "docker.io/library/self-heal-operator:<tag>": failed to resolve reference "docker.io/library/self-heal-operator:<tag>": failed to do request: Head "https://localhost:30071/v2/library/self-heal-operator/manifests/<tag>?ns=docker.io": dial tcp [::1]:30071: connect: connection refused
要解决此问题,请从集群中的所有节点逐个运行以下脚本。
#!/bin/bash
export KUBECONFIG=${KUBECONFIG:-/etc/rancher/rke2/rke2.yaml}
export PATH=$PATH:/var/lib/rancher/rke2/bin:${SCRIPT_DIR}/Fabric_Installer/bin:/usr/local/bin
function get_docker_registry_url() {
local rancher_registries_file="/etc/rancher/rke2/registries.yaml"
config=$(cat < ${rancher_registries_file} | grep -A1 "configs:"|tail -n1| awk '{print $0}'|tr -d ' '|tr -d '"')
url="${config::-1}"
echo "${url}"
}
function get_docker_registry_credentials() {
local key="$1"
local rancher_registries_file="/etc/rancher/rke2/registries.yaml"
value=$(cat < ${rancher_registries_file} | grep "$key:" | cut -d: -f2 | xargs)
echo "${value}"
}
function get_cluster_config() {
local key=$1
# the go template if prevents it from printing <no-value> instead of empty strings
value=$(kubectl get secret service-cluster-configurations \
-o "go-template={{if index .data \"${key^^}\"}}{{index .data \"${key^^}\"}}{{end}}" \
-n uipath-infra --ignore-not-found) || true
echo -n "$(base64 -d <<<"$value")"
}
function update_image_tag() {
username=$(get_docker_registry_credentials username)
password=$(get_docker_registry_credentials password)
url=$(get_docker_registry_url)
images=(self-heal-operator sf-k8-utils-rhel)
for image in ${images[@]}; do
echo "Start checking available $image tag"
tag=$(curl -u $username:$password -X GET https://${url}/v2/$image/tags/list -k -q -s | jq -rc .tags[0] )
if [[ "${tag}" != "null" ]]; then
echo "$image with tag ${tag} found..."
podman login ${url} --username $username --password $password --tls-verify=false
podman pull ${url}/${image}:${tag} --tls-verify=false
podman tag ${url}/${image}:${tag} ${url}/uipath/${image}:${tag}
podman tag ${url}/${image}:${tag} ${url}/library/${image}:${tag}
podman push ${url}/uipath/${image}:${tag} --tls-verify=false
podman push ${url}/library/${image}:${tag} --tls-verify=false
echo "$image is retag and push to docker registry"
else
echo "no tag available for $image"
fi
done
}
function validate_rke2_registry_config() {
local rancher_registries_file="/etc/rancher/rke2/registries.yaml"
local endpoint_present="false"
endpoint=$(cat < ${rancher_registries_file} | grep -A2 "docker.io:" | grep -A1 "endpoint:"|tail -n1|xargs)
if [[ -n "${endpoint}" ]]; then
endpoint_present="true"
fi
echo "${endpoint_present}"
}
function update_rke2_registry_config() {
local DOCKER_REGISTRY_URL=$(get_docker_registry_url)
local DOCKER_REGISTRY_LOCAL_USERNAME=$(get_docker_registry_credentials username)
local DOCKER_REGISTRY_LOCAL_PASSWORD=$(get_docker_registry_credentials password)
local registriesPath="/etc/rancher/rke2/registries.yaml"
local DOCKER_REGISTRY_NODEPORT=30071
echo "Create temp file with name ${registriesPath}_tmp"
cp -r ${registriesPath} ${registriesPath}_tmp
echo "Start updating ${registriesPath}"
cat > "${registriesPath}" <<EOF
mirrors:
docker-registry.docker-registry.svc.cluster.local:5000:
endpoint:
- "https://${DOCKER_REGISTRY_URL}"
docker.io:
endpoint:
- "https://${DOCKER_REGISTRY_URL}"
${DOCKER_REGISTRY_URL}:
endpoint:
- "https://${DOCKER_REGISTRY_URL}"
configs:
"localhost:${DOCKER_REGISTRY_NODEPORT}":
tls:
insecure_skip_verify: true
auth:
username: ${DOCKER_REGISTRY_LOCAL_USERNAME}
password: ${DOCKER_REGISTRY_LOCAL_PASSWORD}
EOF
}
function is_server_node() {
[[ "$(systemctl is-enabled rke2-server 2>>/dev/null)" == "enabled" ]] && echo -n "true" && return
echo "false"
}
function main() {
local is_server_node=$(is_server_node)
local install_type=$(get_cluster_config "INSTALL_TYPE")
if [[ "${install_type}" != "offline" ]]; then
echo "This script is compatible with only offline cluster. Current cluster install_type=${install_type}"
exit 0
fi
if [[ "${is_server_node}" == "true" ]]; then
echo "current node is identified as server node. Updating image tag"
update_image_tag
else
echo "current node is identified as agent node."
fi
rke2_registry_config_is_valid=$(validate_rke2_registry_config)
if [[ "${rke2_registry_config_is_valid}" == "false" ]]; then
echo "start updating rke2 config"
update_rke2_registry_config
if [[ "${is_server_node}" == "true" ]]; then
echo "Registry configuration is updated. Restarting service using command: systemctl restart rke2-server"
systemctl restart rke2-server.service
else
echo "Registry configuration is updated. Restarting service using command: systemctl restart rke2-agent"
systemctl restart rke2-agent.service
fi
else
echo "rke2 config update is not required"
fi
}
main
#!/bin/bash
export KUBECONFIG=${KUBECONFIG:-/etc/rancher/rke2/rke2.yaml}
export PATH=$PATH:/var/lib/rancher/rke2/bin:${SCRIPT_DIR}/Fabric_Installer/bin:/usr/local/bin
function get_docker_registry_url() {
local rancher_registries_file="/etc/rancher/rke2/registries.yaml"
config=$(cat < ${rancher_registries_file} | grep -A1 "configs:"|tail -n1| awk '{print $0}'|tr -d ' '|tr -d '"')
url="${config::-1}"
echo "${url}"
}
function get_docker_registry_credentials() {
local key="$1"
local rancher_registries_file="/etc/rancher/rke2/registries.yaml"
value=$(cat < ${rancher_registries_file} | grep "$key:" | cut -d: -f2 | xargs)
echo "${value}"
}
function get_cluster_config() {
local key=$1
# the go template if prevents it from printing <no-value> instead of empty strings
value=$(kubectl get secret service-cluster-configurations \
-o "go-template={{if index .data \"${key^^}\"}}{{index .data \"${key^^}\"}}{{end}}" \
-n uipath-infra --ignore-not-found) || true
echo -n "$(base64 -d <<<"$value")"
}
function update_image_tag() {
username=$(get_docker_registry_credentials username)
password=$(get_docker_registry_credentials password)
url=$(get_docker_registry_url)
images=(self-heal-operator sf-k8-utils-rhel)
for image in ${images[@]}; do
echo "Start checking available $image tag"
tag=$(curl -u $username:$password -X GET https://${url}/v2/$image/tags/list -k -q -s | jq -rc .tags[0] )
if [[ "${tag}" != "null" ]]; then
echo "$image with tag ${tag} found..."
podman login ${url} --username $username --password $password --tls-verify=false
podman pull ${url}/${image}:${tag} --tls-verify=false
podman tag ${url}/${image}:${tag} ${url}/uipath/${image}:${tag}
podman tag ${url}/${image}:${tag} ${url}/library/${image}:${tag}
podman push ${url}/uipath/${image}:${tag} --tls-verify=false
podman push ${url}/library/${image}:${tag} --tls-verify=false
echo "$image is retag and push to docker registry"
else
echo "no tag available for $image"
fi
done
}
function validate_rke2_registry_config() {
local rancher_registries_file="/etc/rancher/rke2/registries.yaml"
local endpoint_present="false"
endpoint=$(cat < ${rancher_registries_file} | grep -A2 "docker.io:" | grep -A1 "endpoint:"|tail -n1|xargs)
if [[ -n "${endpoint}" ]]; then
endpoint_present="true"
fi
echo "${endpoint_present}"
}
function update_rke2_registry_config() {
local DOCKER_REGISTRY_URL=$(get_docker_registry_url)
local DOCKER_REGISTRY_LOCAL_USERNAME=$(get_docker_registry_credentials username)
local DOCKER_REGISTRY_LOCAL_PASSWORD=$(get_docker_registry_credentials password)
local registriesPath="/etc/rancher/rke2/registries.yaml"
local DOCKER_REGISTRY_NODEPORT=30071
echo "Create temp file with name ${registriesPath}_tmp"
cp -r ${registriesPath} ${registriesPath}_tmp
echo "Start updating ${registriesPath}"
cat > "${registriesPath}" <<EOF
mirrors:
docker-registry.docker-registry.svc.cluster.local:5000:
endpoint:
- "https://${DOCKER_REGISTRY_URL}"
docker.io:
endpoint:
- "https://${DOCKER_REGISTRY_URL}"
${DOCKER_REGISTRY_URL}:
endpoint:
- "https://${DOCKER_REGISTRY_URL}"
configs:
"localhost:${DOCKER_REGISTRY_NODEPORT}":
tls:
insecure_skip_verify: true
auth:
username: ${DOCKER_REGISTRY_LOCAL_USERNAME}
password: ${DOCKER_REGISTRY_LOCAL_PASSWORD}
EOF
}
function is_server_node() {
[[ "$(systemctl is-enabled rke2-server 2>>/dev/null)" == "enabled" ]] && echo -n "true" && return
echo "false"
}
function main() {
local is_server_node=$(is_server_node)
local install_type=$(get_cluster_config "INSTALL_TYPE")
if [[ "${install_type}" != "offline" ]]; then
echo "This script is compatible with only offline cluster. Current cluster install_type=${install_type}"
exit 0
fi
if [[ "${is_server_node}" == "true" ]]; then
echo "current node is identified as server node. Updating image tag"
update_image_tag
else
echo "current node is identified as agent node."
fi
rke2_registry_config_is_valid=$(validate_rke2_registry_config)
if [[ "${rke2_registry_config_is_valid}" == "false" ]]; then
echo "start updating rke2 config"
update_rke2_registry_config
if [[ "${is_server_node}" == "true" ]]; then
echo "Registry configuration is updated. Restarting service using command: systemctl restart rke2-server"
systemctl restart rke2-server.service
else
echo "Registry configuration is updated. Restarting service using command: systemctl restart rke2-agent"
systemctl restart rke2-agent.service
fi
else
echo "rke2 config update is not required"
fi
}
main
备注:
fix_image_project_id.sh
脚本将重新启动 Kubernetes 服务器(rke2 服务)以及节点上运行的所有工作负载。
仅当您使用 Automation Suite 2021.10.0、2021.10.1、2021.10.2、2021.10.3 或 2021.10.4 时,才需要运行
fix_image_project_id.sh
脚本。