Best practice to save aggregated data to elasticsearch for long time storage?

asp · November 15, 2017, 8:22pm

Hi,

We have a lot of data in ES and we have a retention time of 30 days to have all detail data available.

We've imported the bare logs into ES (parsed some fields, etc of course).
In our dashboards we are doing a lot of aggregations on all our big log sources.

Sometimes it is important for us to compare the current data with old data which is some months old. Currently we are doing this in a quite ugly way. We are "exporting" a month view of a bunch of dashboards as html export or by screenshot.

Now I am thinking of a different approach and need some advice about the easiest and most practical way to reach that target:

I would like to create a batch job, which is querying ES each night (when load on the system is low).
This job will accumulate the data a bigger time interval (e.g. 1 or 3 hours).
The results of these accumulations / aggregations I need to put into ES again, but in another index with a higher retention time. This archival index I would like to keep for a year or longer.

The storage needed for archival will be much much smaller, because for a 6GB log with 4 mio log entrys a day, I will keep only 24 log entries a day, where I store following fields for example:

count
processing time avg
processing time min
processing time max
processing time percentile 25, 50, 75, 90, 95

So what is the easiest way to reach that?
I have following Ideas in my mind, but I am open to new ones

Idea 1:

generate the aggregation via curl and append to a file
ship this file to logstash. Most parsing work should be done by json parser. Then I just change the target index and maybe (if needed) the type to prevent interference with the source data.

idea 2:

create a shell script
do the aggregation
insert into elasticsearch directly

idea 3:

connect to ES via java api for query and insert.

idea 4:

is there a way to do it ES internal?

scheduling may be done via cron

My current favorite is idea 1, because I am familiar with doing aggregations via curl (steal the query string from kibana) and I am familiar with logstash. Also logstash is concerning on all stuff like bulk inserting and so on.

But maybe there are some downsides / pitfalls I do not see yet.

A yeah, we are currently on 5.1.2 stack, but planing to upgrade to 5.latest or better 6.x. So I do not want to implement sth. which needs much changing when upgrading to 6.x or 7.x when it becomes available.

Thanks a lot, Andreas

system · December 13, 2017, 8:22pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Storing aggregation results back into elasticsearch Elasticsearch	2	506	July 6, 2017
Save results from aggregation to new index? Elasticsearch	2	2264	June 21, 2017
Data aggregation and storage for specific time interval Elasticsearch	1	515	July 5, 2017
Save results of aggregation to new index? Elasticsearch	2	1019	April 13, 2018
Suggestions for aggregating time series log data Elasticsearch	1	660	July 5, 2017

Best practice to save aggregated data to elasticsearch for long time storage?

Related topics