Rollup data in ES

Srinath_C · September 23, 2014, 3:07am

Hi,
We are currently utilizing ES for analyzing the last 24 hours of data.
The arrival rate of data is of the order of a few hundreds per 10 second
interval with each document having a timestamp associated with it.
We now need to be able to analyze data over a week and to reduce the
amount of space required we plan to retain the 24 hr TTL on the documents
but aggregate data into one document per minute to retrieve data that is
dated beyond 24 hours and upto 7 days. All fields in the document need to
be aggregated.

So,

Is there any out of the box features that I can use to achieve this kind
of roll ups?
What is the best approach (preferably a time-tested one if someone has
already done this)?

Some approaches we were contemplating:
Aggregating the data in real time (outside ES) and store the aggregated
data into ES
Periodically (say once in 30 mins) run aggregation queries and write
back responses to ES
Periodically (say once in 30 mins) read new documents using time range,
aggregate and store back aggregated data in bulk into ES. Maybe use
streaming or paged read of documents to aggregate them....
Maybe use a combination of 1 and (2 or 3) so that real time data gets
aggregated and data that is delayed (may happen) due to some reason can be
updated into the aggregated data using the Update API of ES?

Thanks for all the advices,
Srinath.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHhx-GJa_1Qeko20C%3DSaYdWYOJt1EmW-oq8Nj931by4Ab3CDkA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

otisg · September 23, 2014, 4:17pm

Hi,

We aggregate outside of ES, in memory, and push in bulk. We could still
roll up the data stored in ES later on if we wanted to, but reading from ES
could get expensive.

Otis

Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/

On Monday, September 22, 2014 11:08:09 PM UTC-4, Srinath C wrote:

Hi,
We are currently utilizing ES for analyzing the last 24 hours of data.
The arrival rate of data is of the order of a few hundreds per 10 second
interval with each document having a timestamp associated with it.
We now need to be able to analyze data over a week and to reduce the
amount of space required we plan to retain the 24 hr TTL on the documents
but aggregate data into one document per minute to retrieve data that is
dated beyond 24 hours and upto 7 days. All fields in the document need to
be aggregated.
So,
Is there any out of the box features that I can use to achieve this
kind of roll ups?

What is the best approach (preferably a time-tested one if someone has
already done this)?

Some approaches we were contemplating:

Aggregating the data in real time (outside ES) and store the aggregated
data into ES

Periodically (say once in 30 mins) run aggregation queries and write
back responses to ES

Periodically (say once in 30 mins) read new documents using time range,
aggregate and store back aggregated data in bulk into ES. Maybe use
streaming or paged read of documents to aggregate them....

Maybe use a combination of 1 and (2 or 3) so that real time data gets
aggregated and data that is delayed (may happen) due to some reason can be
updated into the aggregated data using the Update API of ES?

Thanks for all the advices,
Srinath.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/381cc014-cf85-4235-9552-af33d9629c1d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Srinath_C · September 24, 2014, 1:39pm

Otis, Thanks for the reply.

On Tuesday, 23 September 2014 21:47:42 UTC+5:30, Otis Gospodnetic wrote:

Hi,

We aggregate outside of ES, in memory, and push in bulk. We could still
roll up the data stored in ES later on if we wanted to, but reading from ES
could get expensive.

Otis

Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/

On Monday, September 22, 2014 11:08:09 PM UTC-4, Srinath C wrote:
Hi,
We are currently utilizing ES for analyzing the last 24 hours of
data. The arrival rate of data is of the order of a few hundreds per 10
second interval with each document having a timestamp associated with it.
We now need to be able to analyze data over a week and to reduce the
amount of space required we plan to retain the 24 hr TTL on the documents
but aggregate data into one document per minute to retrieve data that is
dated beyond 24 hours and upto 7 days. All fields in the document need to
be aggregated.
So,
Is there any out of the box features that I can use to achieve this
kind of roll ups?

What is the best approach (preferably a time-tested one if someone has
already done this)?

Some approaches we were contemplating:

Aggregating the data in real time (outside ES) and store the
aggregated data into ES

Periodically (say once in 30 mins) run aggregation queries and write
back responses to ES

Periodically (say once in 30 mins) read new documents using time
range, aggregate and store back aggregated data in bulk into ES. Maybe use
streaming or paged read of documents to aggregate them....

Maybe use a combination of 1 and (2 or 3) so that real time data gets
aggregated and data that is delayed (may happen) due to some reason can be
updated into the aggregated data using the Update API of ES?

Thanks for all the advices,
Srinath.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/44224334-abcb-4794-893e-d2f4f0f199ba%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
Storing aggregation results back into elasticsearch Elasticsearch	2	496	July 6, 2017
Best practice to save aggregated data to elasticsearch for long time storage? Elasticsearch	1	3367	December 13, 2017
Data aggregation and storage for specific time interval Elasticsearch	1	500	July 5, 2017
Performance issues using Elasticsearch as a time window storage Elasticsearch	6	394	July 6, 2017
Rollup strategy in Elastic Elasticsearch	1	796	December 11, 2017

Rollup data in ES

Otis

Otis

Related topics