Rollup Granularity and Aggregation


(Christos Mallios) #1

I am investigating whether I can use rollup as a sort of optimization for some queries. In general, we use Elasticsearch mostly as a Data Analytics engine, so we are doing lots of ad-hoc nested aggregations (with possible existence of filters - most of the filters do not filter lots of data, except from the datetime filter, if exists). The queries are created by clients and not from us, so it is not easy to make some sort of materialization. Having said that, there are some known queries that are executed during the sliding window of the last 6 months. This sliding window is updated per day and not per months(e.g. one day the sliding window will be 1 Aug-31 Jan, while the next day the window will be 2 Aug - 1 Feb). I have seen some things about the usage of the rollup, however I have not completely understood how internally the pre-aggregated data are stored, and whether the already existed buckets will be updated in next runs or not. So my dilemma regarding granularity is pretty much this: Daily or Monthly granularity? The advantage with monthly granularity will be that the rollup indices are expected to be much smaller and I would expect a faster response time as well, because of the less amount of computation, however I am not sure if the monthly granularity will work in my case, given the sliding window that I have described above. From the other hand, daily granularity is probably flexible enough to do my job. Any help on that?