Automation Suite
2022.4
false
重要 :
请注意此内容已使用机器翻译进行了部分本地化。
Automation Suite 安装指南
Last updated 2024年7月12日

由于“快照过多”错误,备份失败

描述

在备份期间,通过获取卷快照并将其发送到远程位置来备份 Longhorn 卷。如果快照创建问题影响 Longhorn 卷,且该卷的快照计数高于 248,则备份将不会成功。

您可以通过以下方式验证备份是否由于 TooManySnapshots 错误而失败:
  • 通过检查 Velero 日志:

    kubectl logs  -n velero -l app.kubernetes.io/name=velero -c velero |grep "Waiting for volumesnapshotcontents" kubectl logs  -n velero -l app.kubernetes.io/name=velero -c velero |grep "Waiting for volumesnapshotcontents"

    示例输出:

    time="2023-12-15T08:15:59Z" level=info msg="Waiting for volumesnapshotcontents snapcontent-f073b88e-bd01-40a8-a645-ec929e276cef to have snapshot handle. Retrying in 5s" backup=velero/daily-2 cmd=/plugins/velero-plugin-for-csi logSource="/go/src/velero-plugin-for-csi/internal/util/util.go:182" pluginName=velero-plugin-for-csi
    time="2023-12-15T08:15:59Z" level=info msg="Waiting for volumesnapshotcontents snapcontent-f073b88e-bd01-40a8-a645-ec929e276cef to have snapshot handle. Retrying in 5s" backup=velero/daily-2 cmd=/plugins/velero-plugin-for-csi logSource="/go/src/velero-plugin-for-csi/internal/util/util.go:182" pluginName=velero-plugin-for-csitime="2023-12-15T08:15:59Z" level=info msg="Waiting for volumesnapshotcontents snapcontent-f073b88e-bd01-40a8-a645-ec929e276cef to have snapshot handle. Retrying in 5s" backup=velero/daily-2 cmd=/plugins/velero-plugin-for-csi logSource="/go/src/velero-plugin-for-csi/internal/util/util.go:182" pluginName=velero-plugin-for-csi
    time="2023-12-15T08:15:59Z" level=info msg="Waiting for volumesnapshotcontents snapcontent-f073b88e-bd01-40a8-a645-ec929e276cef to have snapshot handle. Retrying in 5s" backup=velero/daily-2 cmd=/plugins/velero-plugin-for-csi logSource="/go/src/velero-plugin-for-csi/internal/util/util.go:182" pluginName=velero-plugin-for-csi
  • 通过检查 Longhorn Pod 日志:

    kubectl logs  -n longhorn-system -l app=csi-snapshotter --tail=-1 |grep "too many snapshots created"kubectl logs  -n longhorn-system -l app=csi-snapshotter --tail=-1 |grep "too many snapshots created"

    示例输出:

    I1215 08:39:41.707351       1 snapshot_controller.go:291] createSnapshotWrapper: CreateSnapshot for content snapcontent-f073b88e-bd01-40a8-a645-ec929e276cef returned error: rpc error: code = Internal desc = Bad response statusCode [500]. Status [500 Internal Server Error]. Body: [code=Server Error, detail=, message=failed to create snapshot: proxyServer=10.42.7.56:8501 destination=10.42.7.56:10004: failed to snapshot volume: rpc error: code = Unknown desc = failed to create snapshot snapshot-f073b88e-bd01-40a8-a645-ec929e276cef for volume 10.42.7.56:10004: rpc error: code = Unknown desc = too many snapshots created] from [http://longhorn-backend:9500/v1/volumes/pvc-7d89efa4-3d60-4837-a632-f190cd3cd9ed?action=snapshotCreate]I1215 08:39:41.707351       1 snapshot_controller.go:291] createSnapshotWrapper: CreateSnapshot for content snapcontent-f073b88e-bd01-40a8-a645-ec929e276cef returned error: rpc error: code = Internal desc = Bad response statusCode [500]. Status [500 Internal Server Error]. Body: [code=Server Error, detail=, message=failed to create snapshot: proxyServer=10.42.7.56:8501 destination=10.42.7.56:10004: failed to snapshot volume: rpc error: code = Unknown desc = failed to create snapshot snapshot-f073b88e-bd01-40a8-a645-ec929e276cef for volume 10.42.7.56:10004: rpc error: code = Unknown desc = too many snapshots created] from [http://longhorn-backend:9500/v1/volumes/pvc-7d89efa4-3d60-4837-a632-f190cd3cd9ed?action=snapshotCreate]

解决方案

如果遇到 TooManySnapshots 错误,则必须通过运行以下脚本来清理所有卷的快照。有关如何自动执行此操作的详细信息,请参阅如何自动清理 Longhorn 快照
#!/bin/bash
set -e

# longhorn backend URL
url=
# By default, snapshot older than 10 days will be deleted
days=10

function display_usage() {
	echo "usage: $(basename "$0") [-h] -u longhorn-url -d days"
	echo "  -u	Longhorn URL"
	echo "  -d 	Number of days(should be >0). By default, script will delete snapshot older than 10 days."
	echo "  -h	Print help"
}

while getopts 'hd:u:' flag "$@"; do
	case "${flag}" in
		u)
			url=${OPTARG}
			;;
		d)
			days=${OPTARG}
			[ "$days" ] && [ -z "${days//[0-9]}" ] || { echo "Invalid number of days=$days"; exit 1; }
			;;
		h)
			display_usage
			exit 0
			;;
		:)
			echo "Invalid option: ${OPTARG} requires an argument."
			exit 1
			;;
		*)
			echo "Unexpected option ${flag}"
			exit 1
			;;
	esac
done

[[ -z "$url" ]] && echo "Missing longhorn URL" && exit 1

# check if URL is valid
curl -s --connect-timeout 30 ${url}/v1 >> /dev/null || { echo "Unable to connect to longhorn backend"; exit 1; }

echo "Deleting snapshots older than $days days"

# Fetch list of longhorn volumes
vols=$( (curl -s -X GET ${url}/v1/volumes |jq -r '.data[].name') )

#delete given snapshot for given volume
function delete_snapshot() {
	local vol=$1
	local snap=$2

	[[ -z "$vol" || -z "$snap" ]] && echo "Error: delete_snapshot: Empty argument" && return 1
	curl -s -X POST ${url}/v1/volumes/${vol}?action=snapshotDelete -d '{"name": "'$snap'"}'
	echo "Snapshot=$snap deleted for volume=$vol"
}

#perform cleanup for given volume
function cleanup_volume() {
	local vol=$1
	local deleted_snap=0

	[[ -z "$vol" ]] && echo "Error: cleanup_volume: Empty argument" && return 1

	# fetch list of snapshot
	snaps=$( (curl -s -X POST ${url}/v1/volumes/${vol}?action=snapshotList | jq  -r '.data[] | select(.usercreated==true) | .name' ) )
	for i in ${snaps[@]}; do
		if [[ $i == "volume-head" ]]; then
			continue
		fi

		# calculate date difference for snapshot
		snapTime=$(curl -s -X POST ${url}/v1/volumes/${vol}?action=snapshotGet -d '{"name":"'$i'"}' |jq -r '.created')
		currentTime=$(date "+%s")
		timeDiff=$(($currentTime - ($(date -d $snapTime "+%s")) / 86400))
		if [[ $timeDiff -lt $days ]]; then
			echo "Ignoring snapshot $i, since it is older than $timeDiff days"
			continue
		fi

		#trigger deletion for snapshot
		delete_snapshot $vol $i
		deleted_snap=$((deleted_snap+1))
	done

	if [[ "$deleted_snap" -gt 0 ]]; then
		#trigger purge for volume
		curl -s -X POST ${url}/v1/volumes/${vol}?action=snapshotPurge >> /dev/null
	fi

}

for i in ${vols[@]}; do
	cleanup_volume $i
done#!/bin/bash
set -e

# longhorn backend URL
url=
# By default, snapshot older than 10 days will be deleted
days=10

function display_usage() {
	echo "usage: $(basename "$0") [-h] -u longhorn-url -d days"
	echo "  -u	Longhorn URL"
	echo "  -d 	Number of days(should be >0). By default, script will delete snapshot older than 10 days."
	echo "  -h	Print help"
}

while getopts 'hd:u:' flag "$@"; do
	case "${flag}" in
		u)
			url=${OPTARG}
			;;
		d)
			days=${OPTARG}
			[ "$days" ] && [ -z "${days//[0-9]}" ] || { echo "Invalid number of days=$days"; exit 1; }
			;;
		h)
			display_usage
			exit 0
			;;
		:)
			echo "Invalid option: ${OPTARG} requires an argument."
			exit 1
			;;
		*)
			echo "Unexpected option ${flag}"
			exit 1
			;;
	esac
done

[[ -z "$url" ]] && echo "Missing longhorn URL" && exit 1

# check if URL is valid
curl -s --connect-timeout 30 ${url}/v1 >> /dev/null || { echo "Unable to connect to longhorn backend"; exit 1; }

echo "Deleting snapshots older than $days days"

# Fetch list of longhorn volumes
vols=$( (curl -s -X GET ${url}/v1/volumes |jq -r '.data[].name') )

#delete given snapshot for given volume
function delete_snapshot() {
	local vol=$1
	local snap=$2

	[[ -z "$vol" || -z "$snap" ]] && echo "Error: delete_snapshot: Empty argument" && return 1
	curl -s -X POST ${url}/v1/volumes/${vol}?action=snapshotDelete -d '{"name": "'$snap'"}'
	echo "Snapshot=$snap deleted for volume=$vol"
}

#perform cleanup for given volume
function cleanup_volume() {
	local vol=$1
	local deleted_snap=0

	[[ -z "$vol" ]] && echo "Error: cleanup_volume: Empty argument" && return 1

	# fetch list of snapshot
	snaps=$( (curl -s -X POST ${url}/v1/volumes/${vol}?action=snapshotList | jq  -r '.data[] | select(.usercreated==true) | .name' ) )
	for i in ${snaps[@]}; do
		if [[ $i == "volume-head" ]]; then
			continue
		fi

		# calculate date difference for snapshot
		snapTime=$(curl -s -X POST ${url}/v1/volumes/${vol}?action=snapshotGet -d '{"name":"'$i'"}' |jq -r '.created')
		currentTime=$(date "+%s")
		timeDiff=$(($currentTime - ($(date -d $snapTime "+%s")) / 86400))
		if [[ $timeDiff -lt $days ]]; then
			echo "Ignoring snapshot $i, since it is older than $timeDiff days"
			continue
		fi

		#trigger deletion for snapshot
		delete_snapshot $vol $i
		deleted_snap=$((deleted_snap+1))
	done

	if [[ "$deleted_snap" -gt 0 ]]; then
		#trigger purge for volume
		curl -s -X POST ${url}/v1/volumes/${vol}?action=snapshotPurge >> /dev/null
	fi

}

for i in ${vols[@]}; do
	cleanup_volume $i
done

执行上述脚本时,必须传递以下参数:

  • -u - Longhorn 后端 URL。要获取 Longhorn 后端 URL,请运行以下命令:
    kubectl get svc -n longhorn-system longhorn-backend -o json | jq -r '.spec | (.clusterIP|tostring) + ":" + (.ports[0].port|tostring)' to fetch URLkubectl get svc -n longhorn-system longhorn-backend -o json | jq -r '.spec | (.clusterIP|tostring) + ":" + (.ports[0].port|tostring)' to fetch URL
  • -d - 删除快照前的天数。
备注:
要获取受 TooManySnapshots 错误影响的卷的列表,请运行以下命令:
kubectl get volumes -n longhorn-system -o json | jq -r '.items[] | select(([ .status.conditions[] | select(.type == "toomanysnapshots" and .status == "True") ] | length ) == 1 ) | .metadata.name'kubectl get volumes -n longhorn-system -o json | jq -r '.items[] | select(([ .status.conditions[] | select(.type == "toomanysnapshots" and .status == "True") ] | length ) == 1 ) | .metadata.name'
  • 描述
  • 解决方案

此页面有帮助吗?

获取您需要的帮助
了解 RPA - 自动化课程
UiPath Community 论坛
Uipath Logo White
信任与安全
© 2005-2024 UiPath。保留所有权利。