Hi all!
I'm using ES 5.3
and I have an index with 800k docs ~16GB of data
I deleted ~500k docs and refreshed the index
cpus and disks are not under load
I expected a merge to happen
but after several minutes (20 mins) nothing was happening
launching
POST /_forcemerge?only_expunge_deletes=true
triggered the merge and eventually space was released from disk
merge settings are all default values
Can anyone explain me why ES does not try to do the merge automatically?
when does ES check if segments need merging?
does it do it after write operations?
does it checks periodically?
there are many indexes in the cluster: ~180
could this be impact the triggering of the merge?
Elasticsearch (more exactly, Lucene) has something called a merge policy, which takes several factors into account when merging, like the number of segments, their size, the percentage of deleted documents per segment until it decides a merge needs to happen automatically in the background.
if you want to know more about this, you should check out the tiered merge policy.
a merge would only happed if needed (as deemed appropriate by TieredMergePolicy)
As specified on the original question, the problem is not that the merge is not needed.
Rather it seems that a check to see if a merge is needed is not performed automatically
I'd like to have a better understanding of when this check is made, can you help me on this?
Is there any ES configuration related to this?
This is all in Lucene Land, on top of my head you need to look at the IndexWriterConfig merge policy. This one gets applied whenever segments are changed (added or via previous merges). The MergePolicy is the class in question IIRC
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.