I have a test system and I would like to be able to limit how big it gets in general. The goal is to send events to it and overwrite the old events when the Index has reached its limit.
I would like to overwrite the old documents as the new ones come in, if I reach a free space limit in my hard drive, I would like to begin to overwrite old data...
There's no automated way to limit growth by size at the moment, but you could put together a system fairly easily.
If you structure your data such that it is indexed into time-based indices (e.g. hourly indices, or whatever makes sense to you) you could have a cron job check cluster size (via the Cluster Stats API) and delete old indices as required. It's basically a rolling window of data. Most people do retention based on some kind of time metric (30 days, etc), but you could adapt it to be size based instead.
The downside is that this puts constraints on how you organize your data, since you'll need time-based indices. The alternative is running periodic search queries to filter "old" documents, then bulk deletes. Certainly doable, just more work and will be slower.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.