Delete_by_query & _forcemerge doesn't free disk space

mats990 · April 24, 2018, 4:24pm

Hi,

I used delete_by_query API to delete multiple documents and after that _forcemerge API to remove deleted documents.
However, when I use _forcemerge API it finishes instantly and disk usage is the same. Why my API call doesn't do anything and how can I debug reasons for such behavior? I tried to use debug/trace log levels but not sure which logger to look at and when I enable trace on root logger there is too much logs to find anything useful.
I know I have multiple deleted documents so I tried to use "only_expunge_deletes" and nothing changed. I also tried to force single segment per shard with "max_num_segments" but also, nothing changed.

Any suggestions?

Cluster configuration:
40 data nodes
3 master nodes
ES version 5.6.4
Daily indices (~200GB index size)
shards: 20 primary and 1 replica

dadoonet · April 25, 2018, 4:10am

What call did you launch exactly?

Christian_Dahlqvist · April 25, 2018, 5:35am

How much disk space do you have left on the node? Merging will grow disk usage as all merged segments are created before the old ones are deleted.

mats990 · April 25, 2018, 6:53am

I used (multiple attempts with different params):

curl -XPOST 'http://localhost:9200/indexname/_forcemerge' -d '{
"only_expunge_deletes": false,
"max_num_segments": 1
}'

mats990 · April 25, 2018, 6:56am

There is ~100GB free per node. Index size is ~200GB.
There shouldn't be an issue with free disk, should it? I know that ES v2 executed _forcemerge without checking if there is enough disk

Christian_Dahlqvist · April 25, 2018, 6:59am

How many nodes do you have in the cluster?

mats990 · April 25, 2018, 7:00am

40 data nodes and 3 master nodes

Christian_Dahlqvist · April 25, 2018, 7:06am

OK, so then the index should take up relatively little space per node.

When I have used it I think I have invoked it like this:

curl -XPOST "http://localhost:9200/indexname/_forcemerge?max_num_segments=1"

Could you try that and see if that makes any difference?

mats990 · April 25, 2018, 7:52am

That seems to be a solution Is this expected behaviour or should both requests be valid?

Christian_Dahlqvist · April 25, 2018, 8:03am

The documentation does refer to request parameters, so I suspect it might be expected. It would probably be worthwhile adding an example to the docs though.

mats990 · April 25, 2018, 8:25am

Great, thanks a lot for the help.
I created a PR on for documentation change https://github.com/elastic/elasticsearch/pull/30113

system · May 23, 2018, 8:25am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Disk space, delete-by-query, forcemerge Elasticsearch	6	2747	October 2, 2018
Behavior of ForceMerge Elasticsearch	2	376	July 23, 2019
There is no response of _forcemerge api Elasticsearch	2	602	July 5, 2017
_forcemerge query not working? (from Elastic Cloud - ES 2.1.1) Elasticsearch	2	1361	July 5, 2017
Forcemerge API max_num_segments=1 ok for my index? Elasticsearch	2	787	May 15, 2019

Delete_by_query & _forcemerge doesn't free disk space

Related topics