Using aggregations for OLAP


(Roy Jacobs) #1

I am interested in using the new aggregations support to implement
something similar to an OLAP cube.

Let's say I have a big bunch of documents that represent orders. On those
documents I want to calculate a bunch of metrics (using the "metric"
aggregation) based on various fields. Stuff like "# of items". Then, I want
to group this (using the "bucket" aggregation) based on brand, for
instance. All of this is multi-tenant as well, so I need to filter out a
whole lot of irrelevant data for every query.

The amount of documents is quite high (hundreds of millions) so I was
wondering if aggregations have any form of caching or precalculation, or if
they have to traverse the entire index every time I do a query. This could
also be quite prohibitive memory-wise.

Has anyone been using aggregations in this manner?

Roy

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/08487eb4-5a1e-4e30-a873-8d0623d4f355%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(davrob) #2

I'm not sure what the 'official' elasticsearch view on this is but, to me,
from day 1, elasticsearch has had the capability to do everything that OLAP
cubes can do, in a lightweight, agile way. Creating dimensions in cubes is
the same as pre-indexing calculated fields in the index

e.g. week: 6, ,Month: 2, Quarter: 1, year: 2013, decade: second, century:
21st etc. are effectively dimensions calculated from a single date fact:
7th February 2013

The aggregations framework adds an immense amount of power: flexible
aggregations on top of fast search and powerful sorting capabilities, is a
pretty amazing package for business analytics, without any of the hype and
expense typically associated with OLAP and Business Intelligence.

I guess time will tell on the performance front, but I'm quite optimistic,
in the end aggregations and facets are just big in-memory map-reduce jobs -
if you pre-calculate a lot of the dimensions you are interested in, rather
than relying on scripts, you should get pretty decent performance.

-David.

On Monday, 20 January 2014 15:17:49 UTC, Roy Jacobs wrote:

I am interested in using the new aggregations support to implement
something similar to an OLAP cube.

Let's say I have a big bunch of documents that represent orders. On those
documents I want to calculate a bunch of metrics (using the "metric"
aggregation) based on various fields. Stuff like "# of items". Then, I want
to group this (using the "bucket" aggregation) based on brand, for
instance. All of this is multi-tenant as well, so I need to filter out a
whole lot of irrelevant data for every query.

The amount of documents is quite high (hundreds of millions) so I was
wondering if aggregations have any form of caching or precalculation, or if
they have to traverse the entire index every time I do a query. This could
also be quite prohibitive memory-wise.

Has anyone been using aggregations in this manner?

Roy

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c1893874-02bb-4a02-b019-5c7f31189b0d%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #3