Rollup data in ES

Hi,
We are currently utilizing ES for analyzing the last 24 hours of data.
The arrival rate of data is of the order of a few hundreds per 10 second
interval with each document having a timestamp associated with it.
We now need to be able to analyze data over a week and to reduce the
amount of space required we plan to retain the 24 hr TTL on the documents
but aggregate data into one document per minute to retrieve data that is
dated beyond 24 hours and upto 7 days. All fields in the document need to
be aggregated.

So,
  1. Is there any out of the box features that I can use to achieve this kind
    of roll ups?

  2. What is the best approach (preferably a time-tested one if someone has
    already done this)?

    Some approaches we were contemplating:

  3. Aggregating the data in real time (outside ES) and store the aggregated
    data into ES

  4. Periodically (say once in 30 mins) run aggregation queries and write
    back responses to ES

  5. Periodically (say once in 30 mins) read new documents using time range,
    aggregate and store back aggregated data in bulk into ES. Maybe use
    streaming or paged read of documents to aggregate them....

  6. Maybe use a combination of 1 and (2 or 3) so that real time data gets
    aggregated and data that is delayed (may happen) due to some reason can be
    updated into the aggregated data using the Update API of ES?

Thanks for all the advices,
Srinath.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHhx-GJa_1Qeko20C%3DSaYdWYOJt1EmW-oq8Nj931by4Ab3CDkA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Hi,

We aggregate outside of ES, in memory, and push in bulk. We could still
roll up the data stored in ES later on if we wanted to, but reading from ES
could get expensive.

Otis

Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/

On Monday, September 22, 2014 11:08:09 PM UTC-4, Srinath C wrote:

Hi,
We are currently utilizing ES for analyzing the last 24 hours of data.
The arrival rate of data is of the order of a few hundreds per 10 second
interval with each document having a timestamp associated with it.
We now need to be able to analyze data over a week and to reduce the
amount of space required we plan to retain the 24 hr TTL on the documents
but aggregate data into one document per minute to retrieve data that is
dated beyond 24 hours and upto 7 days. All fields in the document need to
be aggregated.

So,
  1. Is there any out of the box features that I can use to achieve this
    kind of roll ups?

  2. What is the best approach (preferably a time-tested one if someone has
    already done this)?

    Some approaches we were contemplating:

  3. Aggregating the data in real time (outside ES) and store the aggregated
    data into ES

  4. Periodically (say once in 30 mins) run aggregation queries and write
    back responses to ES

  5. Periodically (say once in 30 mins) read new documents using time range,
    aggregate and store back aggregated data in bulk into ES. Maybe use
    streaming or paged read of documents to aggregate them....

  6. Maybe use a combination of 1 and (2 or 3) so that real time data gets
    aggregated and data that is delayed (may happen) due to some reason can be
    updated into the aggregated data using the Update API of ES?

Thanks for all the advices,
Srinath.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/381cc014-cf85-4235-9552-af33d9629c1d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Otis, Thanks for the reply.

On Tuesday, 23 September 2014 21:47:42 UTC+5:30, Otis Gospodnetic wrote:

Hi,

We aggregate outside of ES, in memory, and push in bulk. We could still
roll up the data stored in ES later on if we wanted to, but reading from ES
could get expensive.

Otis

Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/

On Monday, September 22, 2014 11:08:09 PM UTC-4, Srinath C wrote:

Hi,
We are currently utilizing ES for analyzing the last 24 hours of
data. The arrival rate of data is of the order of a few hundreds per 10
second interval with each document having a timestamp associated with it.
We now need to be able to analyze data over a week and to reduce the
amount of space required we plan to retain the 24 hr TTL on the documents
but aggregate data into one document per minute to retrieve data that is
dated beyond 24 hours and upto 7 days. All fields in the document need to
be aggregated.

So,
  1. Is there any out of the box features that I can use to achieve this
    kind of roll ups?

  2. What is the best approach (preferably a time-tested one if someone has
    already done this)?

    Some approaches we were contemplating:

  3. Aggregating the data in real time (outside ES) and store the
    aggregated data into ES

  4. Periodically (say once in 30 mins) run aggregation queries and write
    back responses to ES

  5. Periodically (say once in 30 mins) read new documents using time
    range, aggregate and store back aggregated data in bulk into ES. Maybe use
    streaming or paged read of documents to aggregate them....

  6. Maybe use a combination of 1 and (2 or 3) so that real time data gets
    aggregated and data that is delayed (may happen) due to some reason can be
    updated into the aggregated data using the Update API of ES?

Thanks for all the advices,
Srinath.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/44224334-abcb-4794-893e-d2f4f0f199ba%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.