Hi,
please do you have experience and tips how to solve the problem with ORPHANS on elastic nodes installed on old Centos7 VM 8core/32GB RAM, 2TB SSD nodes
we have elasticsearch version 8.12.1
new nodes are running on RockyLinux latest reease 9.x and older one on Centos 7
the nodes with centos7 experience of having orphan index directories which on some cases consume 90% of disk capacity
GET _cat/allocation?v&s=shards
#first two
shards disk.indices disk.used disk.avail disk.total disk.percent host ip node node.role
6 75.5gb 1.7tb 173.5gb 1.9tb 91 10.44.163.25 10.44.163.25 tela15prahkz hilst
7 39.9gb 1.7tb 186.7gb 1.9tb 90 10.44.163.17 10.44.163.17 tela07prahkz hilst
you can see undices consume 75GB but disk is 1,7TB full
when I check the disk I can find the disk s consumed by indices
i created this script that detects orphan directories and it found many of them
when I delete them by this script but the elastic keeps creating new orphans
this behaviour we did not experience on RockyLinux9 but on all Centos.
We plan to upgrade / replace OS , but I am not sure if we did not miss any other options.
Please do you have any advice, why this is happening.
Can I delete orphans this way (I run it on dev but I plan run it on prod)
What can cause this problem?
thanks for any advices
script to detect and delete orphans
#!/bin/bash
# Define paths
ES_NODE_DATA_PATH="/data/elasticsearch/indices"
ES_HOST="http://localhost:9200"
TMP_DIR="/tmp"
ES_USER="elastic"
ES_PASS="secretpassword"
# Get the list of UUIDs from Elasticsearch
curl -s -u "${ES_USER}:${ES_PASS}" "${ES_HOST}/_cat/indices?format=json&h=uuid" | jq -r '.[].uuid' > "${TMP_DIR}/elasticsearch_uuids.txt"
if [ $? -ne 0 ]; then
echo "Error: Failed to fetch data from Elasticsearch."
exit 1
fi
# List all directories in the Elasticsearch data path
ls -1 "${ES_NODE_DATA_PATH}" > "${TMP_DIR}/disk_uuids.txt"
# Compare the lists to find orphaned UUIDs
comm -23 <(sort "${TMP_DIR}/disk_uuids.txt") <(sort "${TMP_DIR}/elasticsearch_uuids.txt") > "${TMP_DIR}/orphaned_uuids.txt"
# Initialize total size variable
total_size=0
# Check the size of each orphaned directory and delete them
echo "Orphaned UUIDs, their sizes, and deletion status:"
while IFS= read -r uuid; do
dir="${ES_NODE_DATA_PATH}/${uuid}"
if [ -d "$dir" ]; then
size=$(du -sb "$dir" | cut -f1) # Size in bytes
size_human=$(du -sh "$dir" | cut -f1) # Human-readable size
echo "$uuid: $size_human"
# Delete the orphaned directory
echo $dir
# rm -rf "$dir" #### DELETE PART UNCOMMENT WHEN U ARE SURE
if [ $? -eq 0 ]; then
echo "$uuid: Directory deleted successfully."
else
echo "$uuid: Failed to delete directory."
fi
# Update total size after deletion
total_size=$((total_size + size))
else
echo "$uuid: Directory does not exist"
fi
done < "${TMP_DIR}/orphaned_uuids.txt"
# Print total size of deleted directories
echo "Total size of deleted orphaned directories: $(numfmt --to=iec-i --suffix=B $total_size)"
# Clean up temporary files
rm -f "${TMP_DIR}/elasticsearch_uuids.txt" "${TMP_DIR}/disk_uuids.txt" "${TMP_DIR}/orphaned_uuids.txt"