Hi,
We have a cluster that has a number of 3 Masters, 2 Hot, 2 Warm, and 2 Cold nodes.
We have a path setting in the Elasticsearch config file like this;
We take snapshots daily by using the snapshot policy. We set the NFS file sharing solution for using local backup. Therefore we defined one of the Cold nodes for the repository as indicated below
As you see, our backup folder is %100 full. it has consumed the total disk size.
The reason for that issue is the indices/ directory. let me show;
[root@cold-2 ~]# du -sh /mnt/data/sharing/es_backup/my_snapshot/indices/
935G /mnt/data/sharing/es_backup/my_snapshot/indices/
[root@cold-2 ~]# ls /mnt/data/sharing/es_backup/my_snapshot/indices/ | wc -l
1569
Deleting the index(curator) or deleting the snapshot(policy) has no impact on the indices/ directory. Because I have tried to delete an index and a snapshot but the number of items in the indices folder has not changed(1569) And I am not sure to need to delete directories under the indices/ directory.
What should I do to manage the indices/ directory for disk optimization?
But i am pointing the /mnt/data/sharing/es_backup/my_snapshot/indices/ directory. Because it is consuming the total disk size.
This directory is 935 GB see;
[root@cold-2 my_snapshot]# du -sh indices/
935G indices/
it has a number of 1569 directory.
[root@cold-2 my_snapshot]# ls indices/ | wc -l
1569
let me show you a piece of them;
[root@cold-2 my_snapshot]# ls indices/ | head -10
0_-AGTecSL28Da6dEwsjUw
0CBg7Ty2SemZ2vrliGae4w
0CczLnGhQA2VsAtBg4Tr_g
0cEmqo0jRI22dypbXVeSAg
0f5ylfMZQeWyN5c_w2Cy5Q
0G9FdhgAQQ-IVRMDBrtIiw
0H6RIAdBRrCtF6hgU-VeGQ
0iAeIzaPRICKcF0YePGrSQ
0IiSZaPOSFaYValNVA_UqA
0OuH6KJ-RpGzbTeqXxsJpA
My questions;
if I delete some of them, will it be a bad impact on the snapshot? and how can I control this directory's size for disk optimization?
I have to decrease the disk usage. it is used %100 now I have already a snapshot policy to keep a few snapshots but the indices/ directory is getting high from day to day... The policy has no impact on the indices/ directory. What should I do?
First things first - NEVER delete data that Elasticsearch uses directly from the filesystem. It will cause you endless issues and data loss. ALWAYS use the APIs.
TDLR you probably need to delete more snapshots from your repo. What sort of regularity are you taking them on?
We take snapshots daily. As far as I understand we should use Index Lifecycle Management or Curator. I am thinking this will help with Disk usage. if we reduce the indices, the snapshot process will take fewer snapshots for indices.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.