Retention Policy for a single growing index

shrikantgulia · February 23, 2018, 9:30am

I have a Single index and its growing daily and as i cant use Curator for data retention as it cant delete data from index

How should i proceed

A help would be really appreciated

Regards
Shrikant

theuntergeek · February 26, 2018, 10:19pm

If you can, switch to using multiple indices behind an alias (hourly, daily, weekly...whatever works for your use case). This will allow you to reduce the need for delete_by_query, which is your only other recourse.

shrikantgulia · February 27, 2018, 5:22am

Hello
Thankyou for the reply

As I am Using elastic stack for the logs storage purpose, if we go by indices on daily basis than it will not be possible because i need the logs for minimum three years.

please guide me how to proceed

Regard
Shrikant

theuntergeek · February 27, 2018, 2:58pm

Why are you asking about the index growing daily if you need to keep data for 3 years? Do you only need to keep some of the data for 3 years? If so, then you need to put the data you want to keep for 3 years in one kind of index and data that can be deleted in another kind of index. You can make a retention policy that still uses daily, weekly, or monthly indices and keep what you want in separate indices for 3 years. You can refer to them by alias, allowing them to appear to be a single index, but have them be separate. Again: Aliases will make it possible to make multiple, disparate indices appear to be a single index, if that is what you require for queries. Your data does not all have to reside in a single, enormous index.

Additionally, you can't have a single enormous index do what you appear to be asking for anyway. S single shard should not exceed 50g (official recommendation from Elastic), so you'd best plan on having your indices rotate into new ones often enough to not exceed that number.

shrikantgulia · February 27, 2018, 5:39pm

Hello Aaron
Thankyou for the reply,

I want to use the Elasticsearch as for storage purpose only, But I want to know How should i proceed as i will be storing the data for around 140 servers(windows and linux) and i want to keep the queryable data for the last three years(index should have data for the last three years).

Please correct me If i am wrong

Regards
Shrikant

theuntergeek · February 27, 2018, 6:52pm

I'm really confused. First you want to know how to handle retention for a single large index because Curator can't delete data from an index. Now you explain that you are trying to keep data for 3 years. What you really have is a much broader question concerning index sizing, shard management, and cluster sizing if you want to keep data for 3 years. Instead of asking for the retention policy for a single growing index, you should be asking,

"How do I plan/architect a cluster to hold 3 years worth of log data?
"What are the things that might affect cluster performance?"
"Is this a good use case for Elasticsearch?"
"What is the maximum allowable delay before a query returns a value?"
- "Can I close indices and only re-open them if a query on very old data needs to happen?"
"Where can I find training and/or other educational materials to help me learn how best to do these things?"

shrikantgulia · February 28, 2018, 5:25am

Hello Aaron,

Thankyou for correcting me,

Yes, I have many doubts as i am at the early stage of my learning. I have doubts Please guide me
> "How do I plan/architect a cluster to hold 3 years worth of log data?
(I will be getting around 1500 logs per 15 minutes per server and i have 140 servers)
> "What are the things that might affect cluster performance?"

Please guide
A help would be really appreciated

Regards
Shrikant

theuntergeek · March 1, 2018, 5:24am

By my calculations, that means approximately 20,160,000 log lines per day. That's only 233.34 per second, which is a relatively slow rate. You should be testing some of this data into a single node test cluster (like your laptop or local workstation), and see how much of your data an index with a single shard and no replicas can hold. You should apply a good index mapping for this data before beginning. You should see roughly how many docs it takes for that single shard to reach approximately 50g in size. This is your single-shard rollover point. You should be indexing into single-shard indices at that rate, and only rolling over (Rollover API) when you hit 50g shard sizes. This would theoretically allow you to put up to 30TB of data on a single node with a 31G heap. I'm guessing that's a lot of days worth of data. Simply add more nodes until you've reached your 3 year mark in terms of capacity. Start expiring old indices at this point.

system · March 29, 2018, 5:25am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Data Retention Policy for Storing and Deleting Elasticsearch	7	7040	November 4, 2022
Data Retention Elasticsearch	7	22002	May 9, 2017
Indexes and time to keep information Elasticsearch	5	3982	February 20, 2017
Sharding by time Elasticsearch	16	1508	July 6, 2017
Elk sizing architecture Elasticsearch	13	8196	July 5, 2017

Retention Policy for a single growing index

Related topics