Evaluating elastic search for our use cases


#1

I'm trying to determine whether elastic search will fit our use cases for real time analytics.

We have a pretty sophisticated aggregation use case. We need to be able to ask whether, within a particular time period of resulting data, whether some expression using an aggregate on possibly past periods is true.

As a concrete example, let's say I'm a company that sells products to various accounts. I have the following data of rolled up purchases of products x and y and the amount of money that was spent.

AccountID: A unique ID for every account in my database.
PeriodID: A unique ID for every period of time. Say that period 1 is Q1 of 2011, period 2 is Q2 of 2011, and so on.
Product_x_purchased: Whether product x was purchased sometime during the specified period id by the specified account id.
Product_x_spend: The amount of money spent when purchasing product x by the specified account during the specified period.
and so on...

AccountID   PeriodID   Product_x_purchased?   Product_x_spend   Product_y_purchased?   Product_y_spend
1              1             0                       0                  0                    0
1              2             1                       100                0                    0
1              3             1                       300                0                    0
1              4             0                       0                  0                    0
2              1             1                       1000               0                    0
2              2             0                       0                  1                    300
2              3             1                       250                1                    300
2              4             1                       100                1                    300

I want to be able to ask the following:

  1. In period 4, the accounts that have spent more than 50 dollars: Account 1 (count = 1)
  2. Over all periods, the accounts and periods for which product x spend was greater than 150: {Account 1, Period 3}, {Account 2, Period 1}, {Account 2, Period 3} (count = 3)
  3. In period 4, the accounts that have spent in total more than $300 on product x over the last three periods AND have purchased product y: {Account 2}, (count = 1) (because 250+100>300 and account 1 didn't purchase product y in period 4)
  4. Over all periods, the accounts and periods that have spent in total more than $300 on product x over the last three periods AND have purchased product y: {Account 2, Period 4}, {Account 2, Period 3}, {Account 2, Period 2} (count = 3).

Items 3 and 4 are basically usages of row frame window functions (see for reference http://blog.jooq.org/2013/11/03/probably-the-coolest-sql-feature-window-functions/)

It's possible that we may only need to support items 1 and 3.

I will continue scraping through the docs to determine whether these use cases is supported but it would save me some time if someone could quickly take a look at this and let me know. On the surface, it looks like it's supported. One thing that's not clear to me is whether we can use results of aggregates in higher level filters in Elasticsearch. That would be needed here.

Thanks so much!


(Mark Walkom) #2

At a quick glance I don't think this would present problems.

2.0 will present better options for chained aggs as well.


(system) #3