Storage reclamation patch

Description

To ensure high performance and prevent outages, we optimize the cluster storage for Automation Suite by reclaiming unused storage. However, ocasionally some of the actively used storage may be reclaimed, leading to service impact for multi-node clusters and possible data loss for single-node clusters.

The following versions are affected by this issue:

2021.10.3 to 2021.10.10
2022.4.8 and earlier
2022.10.7 and earlier
2023.4.2 and earlier

Solution

To make sure actively used storage on nodes is not reclaimed by Automation Suite, run the following script:

#!/bin/bash
echo ""
echo "Starting Storage Reclamation Patch"
echo ""
echo "Checking that this is a server node"

if [ $(sudo systemctl is-enabled rke2-server) ]; then
    echo "  This is a server node"
    echo ""
else
    echo "  FATAL: This is not a server node"
    echo "  This script should only be run on a server node"
    echo "Exiting script"
    echo ""
    exit 1
fi

echo "Generating patch.yaml file at: /tmp/patch.yaml"

if [ -f /tmp/patch.yaml ]; then
    echo "  FATAL: Patch file: /tmp/patch.yaml file already exists"
    echo "  Remove existing /tmp/patch.yaml file and re-run script"
    echo "  Command to remove file: sudo rm -rf /tmp/patch.yaml"
    echo "Exiting script"
    echo ""
    exit 1
fi

sudo cat <<'EOF' > /tmp/patch.yaml
spec:
  template:
    spec:
      containers:
      - name: longhorn-replica-folder-cleanup
        args:
        - /host
        - /bin/bash
        - -ec
        - |
          while true;
          do
            set -o pipefail
            export KUBECONFIG=/etc/rancher/rke2/rke2.yaml PATH=$PATH:/var/lib/rancher/rke2/bin:$INSTALLER_PATH
            which kubectl >> /dev/null || {
              echo "kubectl not found"
              exit 1
            }
            which jq >> /dev/null || {
              echo "jq not found"
              exit 1
            }
            directories=$(find ${LONGHORN_DISK_PATH}/replicas/ -maxdepth 1 -mindepth 1  -type d)
            for dir in $directories;
            do
              basename=$(basename "$dir")
              volume_name=${basename%-*}
              replica_name=$(kubectl -n longhorn-system get replicas.longhorn.io -o json | jq --arg dir "$basename" '.items[] | select(.spec.dataDirectoryName==$dir) | .metadata.name')
              if kubectl -n longhorn-system get volumes.longhorn.io "$volume_name" &>/dev/null;
              then
                if [[ -z ${replica_name} ]];
                then
                  robust_status=$(kubectl -n longhorn-system get volumes.longhorn.io "$volume_name" -o jsonpath='{.status.robustness}')
                  if [[ "${robust_status}" == "healthy" || "${robust_status}" == "degraded" ]];
                  then
                    echo "Replica not found but Volume found with a valid status (robust status ${robust_status}). Data directory $dir can be deleted"
                    rm -rf $dir
                  else
                    echo "Replica not found but Volume found with robust status ${robust_status}. Need to check if there is still a valid replica before deleting data directory $dir so that the directory is not required for recovery"
                  fi
                else
                  echo "Volume found and there is a replica using the data directory $dir"
                fi
              else
                if kubectl -n longhorn-system get volumes.longhorn.io "$volume_name" 2>&1 | grep "NotFound";
                then
                  echo "Volume object not found. Data directory $dir can be deleted."
                  rm -rf $dir
                else
                  echo "Could not fetch volume for $dir"
                fi
              fi
            done
            sleep 600
          done
EOF

# Checker to see if patch.yaml file was created
if [ -f /tmp/patch.yaml ]; then
    echo "  /tmp/patch.yaml file created"
    echo ""
else
    echo "  FATAL: /tmp/patch.yaml file not created"
    echo "  Previous command did not run successfully. Try running: sudo touch /tmp/patch.yaml to see why the command failed to generate the file /tmp/patch.yaml"
    echo "  If help is needed, please contact UiPath Support"
    echo "Exiting script"
    echo ""
    exit 1
fi

echo "Applying patch.yaml file"

echo '  Executing the command: sudo /var/lib/rancher/rke2/bin/kubectl --kubeconfig /etc/rancher/rke2/rke2.yaml -n kube-system patch daemonset longhorn-replica-folder-cleanup --patch "$(cat /tmp/patch.yaml)"'
sudo /var/lib/rancher/rke2/bin/kubectl --kubeconfig /etc/rancher/rke2/rke2.yaml -n kube-system patch daemonset longhorn-replica-folder-cleanup --patch "$(cat /tmp/patch.yaml)" 1>/dev/null
exit_code=$?

echo ""

echo "Checking that the patch was applied"
if [ $exit_code -eq 0 ]; then
    echo " Patch was applied successfully"
    echo ""
else
    echo "  FATAL: Patch was not applied successfully"
    echo "  Previous command did not run successfully. Check previous errors and Contact UiPath Support for help."
    echo "  Before re-running script please run this command: sudo rm -f /tmp/patch.yaml"
    echo "Exiting script"
    echo ""
    exit 1
fi

echo "Removing /tmp/patch.yaml file"
echo ""
sudo rm -rf /tmp/patch.yaml

echo "System is patched. Make sure to run this tool in all environments on any master node"
echo ""#!/bin/bash
echo ""
echo "Starting Storage Reclamation Patch"
echo ""
echo "Checking that this is a server node"

if [ $(sudo systemctl is-enabled rke2-server) ]; then
    echo "  This is a server node"
    echo ""
else
    echo "  FATAL: This is not a server node"
    echo "  This script should only be run on a server node"
    echo "Exiting script"
    echo ""
    exit 1
fi

echo "Generating patch.yaml file at: /tmp/patch.yaml"

if [ -f /tmp/patch.yaml ]; then
    echo "  FATAL: Patch file: /tmp/patch.yaml file already exists"
    echo "  Remove existing /tmp/patch.yaml file and re-run script"
    echo "  Command to remove file: sudo rm -rf /tmp/patch.yaml"
    echo "Exiting script"
    echo ""
    exit 1
fi

sudo cat <<'EOF' > /tmp/patch.yaml
spec:
  template:
    spec:
      containers:
      - name: longhorn-replica-folder-cleanup
        args:
        - /host
        - /bin/bash
        - -ec
        - |
          while true;
          do
            set -o pipefail
            export KUBECONFIG=/etc/rancher/rke2/rke2.yaml PATH=$PATH:/var/lib/rancher/rke2/bin:$INSTALLER_PATH
            which kubectl >> /dev/null || {
              echo "kubectl not found"
              exit 1
            }
            which jq >> /dev/null || {
              echo "jq not found"
              exit 1
            }
            directories=$(find ${LONGHORN_DISK_PATH}/replicas/ -maxdepth 1 -mindepth 1  -type d)
            for dir in $directories;
            do
              basename=$(basename "$dir")
              volume_name=${basename%-*}
              replica_name=$(kubectl -n longhorn-system get replicas.longhorn.io -o json | jq --arg dir "$basename" '.items[] | select(.spec.dataDirectoryName==$dir) | .metadata.name')
              if kubectl -n longhorn-system get volumes.longhorn.io "$volume_name" &>/dev/null;
              then
                if [[ -z ${replica_name} ]];
                then
                  robust_status=$(kubectl -n longhorn-system get volumes.longhorn.io "$volume_name" -o jsonpath='{.status.robustness}')
                  if [[ "${robust_status}" == "healthy" || "${robust_status}" == "degraded" ]];
                  then
                    echo "Replica not found but Volume found with a valid status (robust status ${robust_status}). Data directory $dir can be deleted"
                    rm -rf $dir
                  else
                    echo "Replica not found but Volume found with robust status ${robust_status}. Need to check if there is still a valid replica before deleting data directory $dir so that the directory is not required for recovery"
                  fi
                else
                  echo "Volume found and there is a replica using the data directory $dir"
                fi
              else
                if kubectl -n longhorn-system get volumes.longhorn.io "$volume_name" 2>&1 | grep "NotFound";
                then
                  echo "Volume object not found. Data directory $dir can be deleted."
                  rm -rf $dir
                else
                  echo "Could not fetch volume for $dir"
                fi
              fi
            done
            sleep 600
          done
EOF

# Checker to see if patch.yaml file was created
if [ -f /tmp/patch.yaml ]; then
    echo "  /tmp/patch.yaml file created"
    echo ""
else
    echo "  FATAL: /tmp/patch.yaml file not created"
    echo "  Previous command did not run successfully. Try running: sudo touch /tmp/patch.yaml to see why the command failed to generate the file /tmp/patch.yaml"
    echo "  If help is needed, please contact UiPath Support"
    echo "Exiting script"
    echo ""
    exit 1
fi

echo "Applying patch.yaml file"

echo '  Executing the command: sudo /var/lib/rancher/rke2/bin/kubectl --kubeconfig /etc/rancher/rke2/rke2.yaml -n kube-system patch daemonset longhorn-replica-folder-cleanup --patch "$(cat /tmp/patch.yaml)"'
sudo /var/lib/rancher/rke2/bin/kubectl --kubeconfig /etc/rancher/rke2/rke2.yaml -n kube-system patch daemonset longhorn-replica-folder-cleanup --patch "$(cat /tmp/patch.yaml)" 1>/dev/null
exit_code=$?

echo ""

echo "Checking that the patch was applied"
if [ $exit_code -eq 0 ]; then
    echo " Patch was applied successfully"
    echo ""
else
    echo "  FATAL: Patch was not applied successfully"
    echo "  Previous command did not run successfully. Check previous errors and Contact UiPath Support for help."
    echo "  Before re-running script please run this command: sudo rm -f /tmp/patch.yaml"
    echo "Exiting script"
    echo ""
    exit 1
fi

echo "Removing /tmp/patch.yaml file"
echo ""
sudo rm -rf /tmp/patch.yaml

echo "System is patched. Make sure to run this tool in all environments on any master node"
echo ""

Running the script:

To run the script, take the following steps:

Copy the script to one of the automation suite server nodes and name it storageReclamationPatch.sh.
Change the permissions to make the script executable:
```
chmod 755 storageReclamationPatch.sh
```
Execute the script:
1. Make sure that scripting is enabled so that the execution and output of the script can be captured in a log file. If any issue in case any issue is encountered, this will help our support team diagnose the issue.
2. To start the scripting program and the execute the patch script run the following command:
  script storageReclamationPatch.log ./storageReclamationPatch.sh
Use the exit command to exit the script and to generate the storageReclamationPatch.log log file. If you encounter any issues during this stage, share them with our support team.

Example of running the script:

[admin_1@autosuite storageReclamationPatch]$ script storageReclamationPatch.log
Script started, file is storageReclamationPatch.log
[admin_1@autosuite storageReclamationPatch]$ ./storageReclamationPatch.sh
//Script executes, maybe some debugging is done
[admin_1@autosuite storageReclamationPatch]$ exit
exit
Script done, file is storageReclamationPatch.log[admin_1@autosuite storageReclamationPatch]$ script storageReclamationPatch.log
Script started, file is storageReclamationPatch.log
[admin_1@autosuite storageReclamationPatch]$ ./storageReclamationPatch.sh
//Script executes, maybe some debugging is done
[admin_1@autosuite storageReclamationPatch]$ exit
exit
Script done, file is storageReclamationPatch.log

If executed successfully, the script displays the following message:

Checking that the patch was applied Patch was applied successfully

Note:

The script only needs to be executed on one server node per environment. If the script fails or an unresolvable issue is encountered, share the storageReclamationPatch.log file with support.

On this page

Description
Solution

Was this page helpful?

PREVIOUSFailure to create persistent volumes

NEXTAuthentication troubleshooting

Support and Services

Get The Help You Need

UiPath Academy

Learning RPA - Automation Courses

UiPath Forum

UiPath Community Forum

Trust and Security

Cookies Policy