Elasticsearch not freeing up disk space

srbay · September 14, 2016, 10:45pm

Hi guys,

We have a standard ELK stack with three nodes for Elasticsearch. The three nodes have the same amount of disk space. We haven't configured any special options, this install is just using defaults. Initially we threw a lot of unindexed data at it and later we properly configured Logstash to parse out fields and such.

At some point in time all three nodes reached 91% disk space and no new data was put in. We have about 80 indexes and between 30-50 million docs per index. I searched the Interwebs for how to clear data we didn't need. I'm essentially going to run curl -XDELETE "localhost:9200/[INDEX]/_query?pretty" -d ' { "query": { "term" : { "tags" : "_grokparsefailure" } } } ' for each index

I let that command chew overnight on an index and it deleted 10 million docs. I cancelled the command (it was still running) and ran curl -XPOST 'http://localhost:9200/INDEX/_forcemerge?only_expunge_deletes=true' to reclaim the disk space.

When looking at the indexes now, I see that the number of docs deleted on the index I tested on is down to about 90k but they are continuing to increase and I see that the store size and primary store size have increased by 2 gigs! The amount of free disk space has not changed.

Am I going about this in the right way? We need to free disk space up badly. Why would the store size increase?

Also, if I see the store size and primary store size are the same in the index listing (localhost:9200/_cat/indices?v), does that indicate that nothing is being compressed?

Thanks.

theuntergeek · September 15, 2016, 2:49am

Delete by query is not a good space saving tactic. And a force merge that is only expunging deletes isn't going to reclaim much lost space, if the indices are old and not getting any new documents sent to them. If the older indices aren't getting any new documents indexed, they need a regular force merge.

If your use case supports it, I recommend just deleting older indices outright with something like Elasticsearch Curator. If you need to keep data longer, then you need more nodes to keep the data longer, since you're running out of space.

srbay · September 16, 2016, 12:09am

Hi Aaron,

Thanks for the response. How would you recommend going about deleting docs that are not needed that have a particular tag if removing older indices completely is not an option?

Thanks.

theuntergeek · September 16, 2016, 4:03am

As mentioned, you need to do a force merge or the deleted segments will not be cleaned out on older indices where no new segments are being created (i.e. only_expunge_deletes=true is unnecessary on older indices). You should try to do this during off-peak times.

Also, you should probably plan on more nodes if you can't delete older indices outright. The savings will not be dramatic unless a significant portion of the index is being deleted.

srbay · September 16, 2016, 10:34pm

Thanks Aaron. Back to one of my original questions though: If I see the store size and primary store size are the same in the index listing (localhost:9200/_cat/indices?v), does that indicate that
nothing is being compressed? What exactly does it mean if I see something like this:

health status index pri rep docs.count docs.deleted store.size pri.store.size
green open logstash-2016.08.25 5 1 31659656 5214928 37.1gb 18.5gb

What's the difference between store.size and pri.store.size? Does this mean I'm eating up 37gb of disk space on a given node for this particular index?

theuntergeek · September 28, 2016, 12:27pm

Store size = space consumed by all shards

Primary store size = space consumed by only the primary shards, not including any replicas.

That space will be distributed wherever your shards are.

Topic		Replies	Views
Elasticsearch 5.2.2 disk usage Elasticsearch	6	439	November 20, 2018
Running curator is no longer freeing up space for us Elasticsearch	16	2603	December 19, 2017
Delete_by_query & _forcemerge doesn't free disk space Elasticsearch	11	2863	May 23, 2018
How to reclaim disk space for an index? Elasticsearch	2	3111	September 25, 2018
Disk space full elasticsearch Elasticsearch	11	6033	March 22, 2020

Elasticsearch not freeing up disk space

Related topics