How to clear disk usage

willam_boss · June 21, 2022, 9:23am

health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
green open sks QcmAeW2sToa3nqbPF7EVzQ 30 1 3101816690 1219212661 16.8tb 8.4tb

Hi All
I have a most large elasticsearch indices , the cluster has 20 nodes ,each one has 16G RAM and 1T SSD , now my cluster disk use 90% of disk . As you see I have almost 1/3 data deleted ,but now I can't merge it like (_forcemerge?only_expunge_deletes=false&max_num_segments=1&flush=true)
becase when it merge. it will use double disk storage , I have no such nodes to add.
Is there any way I can decrease my disk useage

Christian_Dahlqvist · June 22, 2022, 7:35am

Have you tried running _forcemerge?only_expunge_deletes=true to get at least some data cleaned up? I would expect this to reduce disk space at least to some extent, but am not sure how much space it would require (it also depends on whether you already have been forcemerging down to a single segment or not).

If you have taken a snapshot it might also be worth considering dropping the replica shards during the cleanup. It will make more space available but severely affect resiliency.

As you are approaching/reaching the limit it does however look like you will need to increase disk space on the nodes or add nodes to properly solve the problem. If you don't you will run into this problem again and at some point it may not be possible to fix it.

willam_boss · June 23, 2022, 7:38am

_forcemerge?only_expunge_deletes=true this will be danger to my cluster, as I know ,this operation will use double disk storage. my cluster have no space. I want a way to decrease storage with out using other disk storage.

Christian_Dahlqvist · June 23, 2022, 8:49am

Lucene, which is what powers Elasticsearch, creates immutable segments. Deleting data from an index creates a tombstone record in a new segment and only when the old segment is merged is the space used by the deleted document is cleaned up. Merging triggered by Elasticsearch or forcemerging are therefore the only ways to get rid of deleted data. Any delete will therefore always require additional storage to be available.

If you have lots of small segments some of these may merge without affecting all the others, which may only require a small amount of additional storage. If you however previously have forcemerged down to a single huge segment this will need to be merged in order to delete which will result in a lot more space being needed (likely double).

You therefore need to drop the replica while deleting (temporary workaround) or add more nodes or disk space to the cluster (long term solution).

willam_boss · June 28, 2022, 6:59am

max_num_segments=1
how to know how much segments My cluseter have?
max_num_segments =? will use less storage ?

Christian_Dahlqvist · June 28, 2022, 7:08am

It is per index so look at index stats.

No, but if you have multiple segments it may be possible only a few are merged at a time and this will require less additional storage required for the merge, e.g. less that double the index size.

willam_boss · June 28, 2022, 7:25am

can I cancle the merge commond when my storage is not enough to do this merge work ?

Christian_Dahlqvist · June 28, 2022, 7:53am

I think you need to resolve the issue as per my previous comments. There is no way around that.

system · July 26, 2022, 7:54am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elasticsearch don't remove old shards Elasticsearch	9	2167	October 26, 2021
Delete_by_query & _forcemerge doesn't free disk space Elasticsearch	11	2863	May 23, 2018
Disk space, delete-by-query, forcemerge Elasticsearch	6	2598	October 2, 2018
Free disk space monitoring after deleting records Elasticsearch	6	19236	September 27, 2018
Elasticsearch not freeing up disk space Elasticsearch	6	7883	July 5, 2017

How to clear disk usage

Related topics