Data aggregation and storage for specific time interval


(Wq Huang) #1

My deployed ES cluster will receive a huge amount of data per second,every doc it stores has some numeric fields.The aggregation of these fields,such as min,max,count,average,is done on the fly.The problem is that with so much raw data in ES,the calculation process will be unbearable slow,what's worse,I can not store the data for a very long time such as 1 year due to limited storage space and it's unnecessary to store all of the raw data for a whole year.It will be much better to do aggregation like every 10s,1min,10min,etc,then the aggregated data will be stored in ES and the raw data can be discarded.ES,say,will store raw data for 1 month,10s aggregated data for 6 months,10min aggregated data for a year, by that way,I do not need to store the raw data for a year while the trends of the data can remain with a decreased precision.

Is there anyway to accomplish that? I know statsd+graphite can do the job,but I don't wanna deploy and maintain one more system.I am currently using Logstash+ES+Kibana for data filter,storage and representation respectively.

Thanks.


(system) #2