I have now got a few months worth of data into ES via Logstash.
Some of the volumes are small e.g. several hundred MB per day. Whereas some are bigger (still not large) e.g. 10-15Gb per day.
Based upon the volumes is it better I use one bigger index e.g. by month vs YYYY.MM.DD? or just one overall big index. I did see some posts and also this after some searching:
Seems like the recommendation here is to us LS to merge the daily indexes into one monthly index.
Looking for some guidance on recommendations here.
Thanks @theuntergeek. In terms of my question on the change from index by YYYY.MM.DD should i do this especially for the smaller indexes or leave this and have curator handle it?
I would perhaps rethink the need for daily indices and use the Rollover API (which is also supported in Curator now) to only "rollover" indices when they have hit a certain number of documents and/or a month/week/number of days in age. This approach could reduce the increase in shard count associated with a lot of small, daily indices.
hi @theuntergeek understood, but as i mentioned in my original approach I put indexes by day and the recommendation was to use rollover. So now i am confused. Trying to understand how i will solve this for my historical data.
My approach going forward is to put into one big index and then use rollover
How do you query your historical data now? If you only use kibana, then you should be able to have your index pattern defined in a way that matches your historical data and a newer rollover-friendly pattern.
You could keep rules in Curator that will slowly purge out your historical data as the indices are currently named without it hurting anything else.
@theuntergeek because the timeseries volume is low we do keep several months available and yes they're actively been used. Its mainly used for us to look at trending in timelion for historical patterns.
Understanding that right now data is ingested by LS to ES and done daily. I could change this to one big fat index and then rollover that. How do i move all the old documents though?
Seems like best suggestion maybe to use reindex API via curator. I'll give that a whirl.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.