Hi. This week after _forcemerge?only_expunge_deletes=true I saw an increase in disk space usage (# 1 in the picture), after that I tried to reduce the number of replicas from 2 to 1 (# 2 in the picture), and also noticed an increase in disk space usage. After returning the number of replicas, the disk space usage also increases (# 3 in the picture).
After removed some count of replicas i expected reduce disk usage. After return count of replicas back i expected the same disk usage which was before, but usage only increase and increase.
via elasticsearch api i see that available only two shards but in data directory i see four.
Can anyone help me understand why this is occur and how to avoid it in future?
es version: 6.8.13
number of nodes: 30
number of pr shards: 20
number of replicas: 2
elasticsearch api shows that only shards numbered 16 and 17 exists on the current host, but on fs I have 2 and 5. After check directory 2 and 5 i have found full lucene's index with liv, cfs, cfe files (translog spend not a lot of space).
Did you try also to forcemerge the segment count to 1?
No, i didn't (it's bad for inserting in my case), but i did _forcemerge?only_expunge_deletes=true which only increase disk usage on host.
I'm really not understand why elasticsearch not remove old relocated shards. Does elasticsearch api has some endpoint for explicit triggering removing unnecessary data?
Elasticsearch automatically cleans up unused shard data when the shard health is green and none of its copies are relocating. If it can't (e.g. something forbids it from deleting the data) then it will log a warning about the failure.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.