How to make optimize snapshot's indices/ directory

Hi,
We have a cluster that has a number of 3 Masters, 2 Hot, 2 Warm, and 2 Cold nodes.
We have a path setting in the Elasticsearch config file like this;

[root@cold-2 ~]# grep path /etc/elasticsearch/elasticsearch.yml        
path.data: /mnt/data/elasticsearch/data
path.logs: /mnt/data/elasticsearch/logs
path.repo: /mnt/data/es_backup

We take snapshots daily by using the snapshot policy. We set the NFS file sharing solution for using local backup. Therefore we defined one of the Cold nodes for the repository as indicated below

[root@cold-2 ~]# df -h
Filesystem                                Size  Used Avail Use% Mounted on
devtmpfs                                  7.8G     0  7.8G   0% /dev
tmpfs                                     7.8G     0  7.8G   0% /dev/shm
tmpfs                                     7.8G   57M  7.7G   1% /run
tmpfs                                     7.8G     0  7.8G   0% /sys/fs/cgroup
/dev/mapper/cl-root                        92G  5.6G   86G   7% /
/dev/mapper/cl01-mnt_data                 1.2T  1.2T   92K 100% /mnt/data
/dev/sda1                                 976M  190M  720M  21% /boot
10.0.1.2:/mnt/data/sharing/es_backup  1.2T  1.2T  1.0M 100% /mnt/data/es_backup
tmpfs                                     1.6G     0  1.6G   0% /run/user/0

As you see, our backup folder is %100 full. it has consumed the total disk size.

The reason for that issue is the indices/ directory. let me show;

[root@cold-2 ~]# du -sh /mnt/data/sharing/es_backup/my_snapshot/indices/
935G    /mnt/data/sharing/es_backup/my_snapshot/indices/
[root@cold-2 ~]# ls /mnt/data/sharing/es_backup/my_snapshot/indices/ | wc -l
1569

Deleting the index(curator) or deleting the snapshot(policy) has no impact on the indices/ directory. Because I have tried to delete an index and a snapshot but the number of items in the indices folder has not changed(1569) And I am not sure to need to delete directories under the indices/ directory.

What should I do to manage the indices/ directory for disk optimization?

Snapshots are incremental, so even if you are deleting one you may find there's no relevant data to remove.

Howe many snapshots do you have in the repo?

Hi @warkolm ,
There are 14 snapshots. see;

[root@cold-2 my_snapshot]# curl -XGET -u 'user:password' http://10.0.1.2:9200/_snapshot/es_backup/_all | jq -r .snapshots[0:][].snapshot | wc -l
14

Snapshots are stored incremental in the sharing folder which is /mnt/data/sharing/es_backup/my_snapshot/ see;

[root@cold-2 my_snapshot]# ls
index-639                        meta-cziOd9ZNTkGtO6Sz45OxaQ.dat  meta-uhqsHhB7RIa0Jk07ywwlVg.dat  snap-cziOd9ZNTkGtO6Sz45OxaQ.dat  snap-uhqsHhB7RIa0Jk07ywwlVg.dat
index.latest                     meta-gciRSHhPRLO1i4FUcjgHNg.dat  meta-xFgwyBmAS4ytri9qXYS8OA.dat  snap-gciRSHhPRLO1i4FUcjgHNg.dat  snap-xFgwyBmAS4ytri9qXYS8OA.dat
indices                          meta-Onh42_w2TJSC3ldVMietfQ.dat  meta-xg9ZiNTsQ2K3DeQmxErUhA.dat  snap-Onh42_w2TJSC3ldVMietfQ.dat  snap-xg9ZiNTsQ2K3DeQmxErUhA.dat
meta-1VtyWmBwTm-FsuGuDzYZeg.dat  meta-orInCr0JR3SqWLvlfTI-FQ.dat  snap-1VtyWmBwTm-FsuGuDzYZeg.dat  snap-orInCr0JR3SqWLvlfTI-FQ.dat
meta-5-f2AIV1SmeOxxRc41IPUw.dat  meta-RAdLB9YRRpaO6wl3NcyYng.dat  snap-5-f2AIV1SmeOxxRc41IPUw.dat  snap-RAdLB9YRRpaO6wl3NcyYng.dat
meta-9rtRL2D8Sjqj9MwNFZASvw.dat  meta-saQNF4j5ScWE-BD8fDaj1Q.dat  snap-9rtRL2D8Sjqj9MwNFZASvw.dat  snap-saQNF4j5ScWE-BD8fDaj1Q.dat
meta-BYCOHTL5TrybF9qI0p6eTg.dat  meta-ssUOAknpQ12p4NHXNcdLzg.dat  snap-BYCOHTL5TrybF9qI0p6eTg.dat  snap-ssUOAknpQ12p4NHXNcdLzg.dat

But i am pointing the /mnt/data/sharing/es_backup/my_snapshot/indices/ directory. Because it is consuming the total disk size.

This directory is 935 GB see;

[root@cold-2 my_snapshot]# du -sh indices/
935G    indices/

it has a number of 1569 directory.

[root@cold-2 my_snapshot]# ls indices/ | wc -l
1569

let me show you a piece of them;

[root@cold-2 my_snapshot]# ls indices/ | head -10
0_-AGTecSL28Da6dEwsjUw
0CBg7Ty2SemZ2vrliGae4w
0CczLnGhQA2VsAtBg4Tr_g
0cEmqo0jRI22dypbXVeSAg
0f5ylfMZQeWyN5c_w2Cy5Q
0G9FdhgAQQ-IVRMDBrtIiw
0H6RIAdBRrCtF6hgU-VeGQ
0iAeIzaPRICKcF0YePGrSQ
0IiSZaPOSFaYValNVA_UqA
0OuH6KJ-RpGzbTeqXxsJpA

My questions;
if I delete some of them, will it be a bad impact on the snapshot? and how can I control this directory's size for disk optimization?

I have to decrease the disk usage. it is used %100 now :frowning: I have already a snapshot policy to keep a few snapshots but the indices/ directory is getting high from day to day... The policy has no impact on the indices/ directory. What should I do?

First things first - NEVER delete data that Elasticsearch uses directly from the filesystem. It will cause you endless issues and data loss. ALWAYS use the APIs.

TDLR you probably need to delete more snapshots from your repo. What sort of regularity are you taking them on?

Hi @warkolm

We take snapshots daily. As far as I understand we should use Index Lifecycle Management or Curator. I am thinking this will help with Disk usage. if we reduce the indices, the snapshot process will take fewer snapshots for indices.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.