I recently had a process go a bit nuts and add a lot of junk documents to some of my indices. It's not the first time that this has happened, and I have a cleanup script that runs /_delete_by_query and then _forcemerge?only_expunge_deletes=true.
My process is to first pull a copy of the index to a test machine, ensure that the delete/merge works as expected, and then run the process against the production ES cluster. The deletes work, but the forcemerge returns immediately and doesn't do anything. Eg:
Usually if force merge doesn't do anything it's because it's not worth doing anything. You can normally rely on Elasticsearch to merge things as it sees fit so there's no need to explicitly force merge as you're doing. It shouldn't matter too much if there's a few deleted documents in the index.
Well this site seems to be an excellent rubber duck, because the first thing I did after submitting this post was re-read the _forcemerge docs where I finally noticed the note:
This parameter does not override the index.merge.policy.expunge_deletes_allowed setting.
There's no explicit policy set on the index, and I had to dig up the default from non-official sources since it doesn't seem to be documented outside of the source code, which is 10%. In my case the documents deleted only accounted for 5% of the document count, but are themselves quite oversized.
I ran the following to change the policy for this index:
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.