Elastic and HDD Storage

Csepi · April 27, 2017, 7:25am

Hi There,

I'm just a beginner, but I didn't find any answer related on this topic.
I want Elasticsearch to store only 100 Gb of data. Not more. When I first install a test version, it filled the 30Gb storage within 2 weeks, left no space on it.
Is there any possibility to set the maximum used storage or set a date limitation (I don't need the data older than 2 months).

Both of the possibilities would be acceptable.

Best Regards: Peter Cselotei

danielmitterdorfer · April 27, 2017, 7:36am

Hi @Csepi,

Elasticsearch itself cannot do this but you can external tools such as Curator and run them regularly. The Curator docs contain an example to delete older indices based on time. Note that this example relies on a convention that the client application creates one Elasticsearch index per day.

Daniel

theuntergeek · April 27, 2017, 5:44pm

Curator can also filter by size, but there are some important caveats regarding this choice:

Elasticsearch cannot calculate the size of closed indices. Elasticsearch does not keep tabs on how much disk-space closed indices consume. If you close indices, your space calculations will be inaccurate.
Indices consume resources just by existing. You could run into performance and/or operational snags in Elasticsearch as the count of indices climbs.
You need to manually calculate how much space across all nodes. The total you give will be the sum of all space consumed across all nodes in your cluster. If you use shard allocation to put more shards or indices on a single node, it will not affect the total space reported by the cluster, but you may still run out of space on that node.

These are only a few of the caveats. If possible, it's probably wiser to filter by date, rather than by size. You can even combine the two.

Csepi · April 28, 2017, 7:28am

@theuntergeek and @danielmitterdorfer
Thanks for the answer!
A still have a lot to learn about the operation and behavior of Elasticsearch to understand everything you just said
I and my team will review the usage of Curator, and I hope it will solve our problem.

I cannot imagine that nobody had the same problem before and this problem doesn't have more resolution.

(My task is to follow up (and store for 2-4 weeks) every transaction, every single in and outcoming bit from our ticketing server and database. I though this would work with Packetbeat, but It generated a huge file and needed a lot of resources. That's why I introduced the time limitations)

Best Regards and thanks again: Peter Cselotei

theuntergeek · April 28, 2017, 2:04pm

Well, Curator is my solution to the larger problem of index management. The problem with management by size has to do with those caveats mentioned. There are no easy answers there within Elasticsearch, as the system can route documents differently, resulting in differences of shard and index sizes between nodes.

system · May 26, 2017, 2:14pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Curator: Delete oldest indices based on ES cluster size Elasticsearch	10	6449	December 22, 2016
Delete after Total Disk Space Reaches Threshold Elasticsearch	5	4582	July 5, 2017
Elasticsearch efficiently cleaning up the indices to save space Elasticsearch	9	12357	July 21, 2018
How to use Curator to manage old data and avoid running out of storage space? Elasticsearch	2	1285	July 6, 2017
Is it possible to make 'elasticsearch index capped'? Elasticsearch	5	1609	July 5, 2017

Elastic and HDD Storage

Related topics