_forcemerge query not working? (from Elastic Cloud - ES 2.1.1)

erikthered · February 10, 2016, 4:20pm

Continuing the discussion from _forcemerge query not working?:

Hello,

I have an index that was around 460GB. I performed a delete by query on said index to remove ~3/4 of all the documents from the index. I understand (after reading lot's of documentation) about how Lucene just flags these documents for indexing, and later performs deletes upon merging it's segments.

The problem is that Lucene only slightly merged my segments after this delete. The cluster size, after the merge, barely decreased. I was expecting ~2-300GB of space to be freed.

I tried using the forcemerge (then even the deprecated _optimize) query, specifying the maxnum_segments to 1, then specifying the only_expunge_deletes parameters, these don't seem to have any effect at all (perhaps they don't override the elastic Found cluster settings?)

running _cat/segments shows, for this index, ~100 segments like this:

index 0 p IP_ADDRESS _55g 6676 259526 492417 10.7gb 15986248 true true 5.3.1 false
index 0 p IP_ADDRESS _dlk 17624 300293 393906 10.5gb 17363328 true true 5.3.1 false
index 0 p IP_ADDRESS _do3 17715 304912 228060 8.3gb 16702931 true true 5.3.1 false
index 0 p IP_ADDRESS _etl 19209 234812 212975 6.2gb 14415090 true true 5.3.1 false
index 0 p IP_ADDRESS _rph 35909 243908 597468 11.6gb 16228232 true true 5.3.1 false
index 0 p IP_ADDRESS _xyq 44018 211087 600916 10.6gb 14982992 true true 5.3.1 false
index 0 p IP_ADDRESS _10f1 47197 150588 563340 7.9gb 12553617 true true 5.3.1 true
index 0 p IP_ADDRESS _10m5 47453 245789 347914 8.2gb 14972958 true true 5.3.1 true
index 0 p IP_ADDRESS _5ujm 272866 182584 1105416 15.5gb 15480243 true true 5.3.1 true
index 0 p IP_ADDRESS _5ujq 272870 175804 820951 11.5gb 14322993 true true 5.3.1 true
index 0 p IP_ADDRESS _5ujw 272876 6557 1411516 6.5gb 7773895 true true 5.3.1 true
index 0 p IP_ADDRESS _5uye 273398 6534 791621 4.3gb 6949108 true true 5.3.1 true
index 0 p IP_ADDRESS _5vtv 274531 12933 811501 6.1gb 10845022 true true 5.3.1 true...

Shouldn't these segments be triggering a merge automatically? Some have a deleted % of 99.5%

What's preventing Lucene from cleaning up these segments for me?

Cluster ID: 3daac0

This is a very specific elasticsearch-centric question; I've reposted this in the appropriate topic.

The issue in question is around Elasticsearch 2.1.1.

mikemccand · March 1, 2016, 2:49pm

Hmm there was this bug in Lucene:

https://issues.apache.org/jira/browse/LUCENE-6166

where Lucene failed to trigger merges if all you did was delete and never add any documents, but it was fixed in Lucene 5.3.0 and should be included in ES 2.1.1, so you should have seen merges running without having asked for forceMerge. Have you added any documents since doing the deletions?

Are you sure there are no merges running? They will take quite some time to complete. What refresh_interval do you have?

Separately, forceMerge definitely should have done something.

Can you post any settings you've changed from defaults?

Topic		Replies	Views
Delete_by_query & _forcemerge doesn't free disk space Elasticsearch	11	2863	May 23, 2018
ElasticSearch ForceMerge Elasticsearch	3	464	January 5, 2017
Behavior of ForceMerge Elasticsearch	2	349	July 23, 2019
_forcemerge not doing anything Elasticsearch	3	2544	June 9, 2020
Disk space, delete-by-query, forcemerge Elasticsearch	6	2597	October 2, 2018

_forcemerge query not working? (from Elastic Cloud - ES 2.1.1)

Related topics