I'm just a beginner, but I didn't find any answer related on this topic.
I want Elasticsearch to store only 100 Gb of data. Not more. When I first install a test version, it filled the 30Gb storage within 2 weeks, left no space on it.
Is there any possibility to set the maximum used storage or set a date limitation (I don't need the data older than 2 months).
Elasticsearch itself cannot do this but you can external tools such as Curator and run them regularly. The Curator docs contain an example to delete older indices based on time. Note that this example relies on a convention that the client application creates one Elasticsearch index per day.
Curator can also filter by size, but there are some important caveats regarding this choice:
Elasticsearch cannot calculate the size of closed indices. Elasticsearch does not keep tabs on how much disk-space closed indices consume. If you close indices, your space calculations will be inaccurate.
Indices consume resources just by existing. You could run into performance and/or operational snags in Elasticsearch as the count of indices climbs.
You need to manually calculate how much space across all nodes. The total you give will be the sum of all space consumed across all nodes in your cluster. If you use shard allocation to put more shards or indices on a single node, it will not affect the total space reported by the cluster, but you may still run out of space on that node.
These are only a few of the caveats. If possible, it's probably wiser to filter by date, rather than by size. You can even combine the two.
@theuntergeek and @danielmitterdorfer
Thanks for the answer!
A still have a lot to learn about the operation and behavior of Elasticsearch to understand everything you just said
I and my team will review the usage of Curator, and I hope it will solve our problem.
I cannot imagine that nobody had the same problem before and this problem doesn't have more resolution.
(My task is to follow up (and store for 2-4 weeks) every transaction, every single in and outcoming bit from our ticketing server and database. I though this would work with Packetbeat, but It generated a huge file and needed a lot of resources. That's why I introduced the time limitations)
Well, Curator is my solution to the larger problem of index management. The problem with management by size has to do with those caveats mentioned. There are no easy answers there within Elasticsearch, as the system can route documents differently, resulting in differences of shard and index sizes between nodes.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.