I am seeking suggestions for managing long term storage of time series log data.
I use Logstash 2.2.4, Elasticsearch 2.3.1 and Kibana 4.5 for storing, visualizing, and analyzing logs from pfSense 2.2.6 (and soon 2.3). Daily indexes are created for firewall logs (incoming blocked events by port, src IP, src country, src geoip location, etc.) and bandwidth usage (up and down by port, src IP, dst IP, dst geoip location, etc.) An integer field "event_count=1" is created in Logstash for each firewall event as the doc is created. Sums of the event_count field are used in Kibana firewall visualizations rather than document counts.
My goal is to generate new indexes every month that use the same template as the daily indexes; however, these will have sums of event_count and KB_up/KB_down rather than each event as it's own document. The sums will be grouped by the other fields used in Kibana visualizations and given a single timestamp for the day (say @ 12:00:00) . Then the daily indexes will be deleted.
This should reduce the number of documents significantly and also would enable Kibana visualizations to be seamless from near real-time to years, because default time steps for greater than 30 days is "per day" anyway.
My plan is to do this in Python using helpers and pandas. Is there a more direct way using the Elasticsearch API that I may be missing?
I've researched aggregations in Elasticsearch and understand how these are generated, but I am missing how they can be used to maintain summaries of the data long term. Be easy on me... I am new to this...