The issue is that not only did I not know I had too many BUT I dont know how to reduce them in my case scenario.
Have a look at this article.
By points:
How to reduce the number of shards of newly created indices
I have shards set to 1
Reduce the number of replica shards
I do not have replica shards
Reduce the number of primary shards
I cannot do this; Client asked us to delete indexes/records after 180 days. If I do this, index is recreated even if it is old that and the 180 days "limit" is reset.
Reduce the number of shards with the Shrink API
From what I understand, this also creates a NEW index.
Reduce the number of shards with the Reindex API
Same
Reducing the number of shards of your time-based indices
I think this is the only way BUT Im looking in the stack if there is a way to do this automatically or if I have write a month cron script.
Reducing the number of shards for multi-tenancy indices
N/A
Reducing the number of shards with Filtered Aliases
N/A
It sounds like you are creating over 20 indices per day. Merge these into fewer indices and switch to weekly or monthly indices instead of daily.
Im pretty sure they are 20 or over.
Merge these into fewer indices and switch to weekly or monthly indices instead of daily.
When I was looking up information on how to setup Elastic , reading, I read that for daily login events and time based, I should always use daily based ones.
Switch over to monthly based ones doesnt affect me much because the name is variable based but I just want to know why and/or why not use daily ones. Once I get documentation and a explaination, switching them over is pretty much easy as I just adjust my Logstash configuration
Thank you
As you can see a large number of distinct daily indices combined with a long retention period results in a very large number of shards which is inefficient and potentially problematic. Look at some of the links I provided earlier that recommend enduring your shards are at least in the GB size range.
I took a look at my indexes and currently the largest one I have is 645.9mb
I took a look at the article you posted (thank you) and as I mentioned
- The client wants a 180 day retention for logs
- Of the article the only thing I can do is merge log.2021.08.24 , log.2021.08.05, etc. into log.2021.08 ; The only issue with that is that right now log.2021.08.24 is 2 days old while if I merde them right now, log.2021.08 would be zero days old.
- Of the previously mentioned point, looking in Elastic, it seems there is no "automatic" way to do this on a monthly basis; Do I need to write a bash script and using cron run it the first day of each month?
Thanks
Change the way you index so you start indexing into monthly indices now. Over the next 6 months the daily indices will gradually be phased out of the system. If you want to reduce shard count quicker you will need to manually reindex the daily indices into monthly and then remove the daily indices.
Are there cons and pros (besides reducing shard size) to this?
Changing to monthly when it comes to actual configuration shouldnt effect almost anything as my Kibana patterns looks for log.-* and I would just have to change it in Logstash to log-yyyy-mm ....
I just want to make sure that down the road I dont have to change it BACK to daily.......
For this type of data it is often recommended to aim for an average shard size between 20 and 50 GB in size. Based on that you would need no more than 10 shards for the data volume currently in the cluster and you have 400 times that. Given that switching to monthly indices will just reduce the shard count by around a factor of 30 I do not see any risks at all. I would even recommend to also consolidate some of the smaller indices if possible.
Well.......if you say so........
I hope this doesnt come back to bite me in the ass
Thanks for the suggestion
OK so now my future indexes are officially monthly (and will delete themselves 180 days after their creation)
To make sure you keep a full 180 days of data you may need to delete indices 210 days after creation.
What do you mean?
I was gonna ask another question: I currently see daily indexes that are about 500MB in size. WIth this change, Ill have ONE index that will be 15.5 GB in size (at least)....Is this good?
That does not sound terribly large to me.
As each index now hiolds 30 days of data rather than 1, deleting 180 days after creation will result in between 150 and 180 days in the cluster (180 days will correspond to the age of the oldest data in the index, not the newest). If you want to guarantee at least 180 days in the system you need to therefore increase the age.
OK, understood....
Also, what do you mean about expanding it to 210?
180 is a bit less than 6 months.....Overtime and naturally, the daily ones with phase out and the new ones will phase out on the 27th day of th month (so more or less 180 days), correct?
Each index now contains 30 days worth of data. The first index you create will contain days 1-30. 180 days after thisnindex way created it will be holding data 150-180 days old as the oldest data in the index was added just after creation. If you delete this index 180 days after creation you will at that point only have 150 days worth of data in the cluster as you delete the full index with all its data.
Math isnt my thing, sorry.....
I think I get what you are trying to say but it doesnt add up to me at least.
Lets set that all months have 31 days. Starting from Janurary, Ill have it created at 01-01 ....
Month two Feb, would be created at 02-01 , 31 days have passed since index creation of first
Month three Mar, would be created at 03-01 , 62 days have passed since index creation of first
Month four Apr, would be created at 04-01 , 93 days have passed since index creation of first
Month five May, would be created at 05-01 , 124 days have passed since index creation of first
Month six Jun, would be created at 06-01 , 155 days have passed since index creation of first
Month seven Jul, would be created at 07-01; At some point between June and July, the Janurary index will delete itself.
Is there something I missed?
Lets assume each index covers a full month and that months have the same number of days. The first index covers all of January and is created in January 1st. Six months after the index is created (approximately 180 days) is the beginning of July, which is when the first index is deleted. At that point you have only got data for February, March, April, May and June (plus a day or two from July) - 5 months worth of data.
OK, now I understand. Thanks