Help with aggregation/range query

Hi, I have a load of time series data that I want to chart as a stepped chart. However I don't wan the all the data and want to aggregate it (split the range into bucket and return the min max for each bucket). This I have working fine, but the problem is that the data is not regular and I end up with null buckets... And my chart therefore has gaps in it.

What I want instead of the null bucket is the value of the previous data point. I am not sure how to efficiently structure the query to achieve this? i.e. without iterating through each null bucket. I potentially have a large number of there charts.

Any help/guidance appreciated.

Nudge

ES doesn't support interpolating buckets in this manner. You can either request all buckets (default, basically min_doc_count: 0) and get null buckets, or you can request that each bucket has a minimum of 1 doc (min_doc_count: 1). But either way you'll have gaps that you need to deal with in your application.

The various pipeline aggregations have basic gap-filling policies, such as inserting zeros. But no actual interpolation at the moment, and it only applies to the pipeline aggs (like moving average, etc).

I'm afraid you'll just have to do the interpolation in your app for now.

FWIW, this is exactly what ES would have to do anyway: collect all the buckets from all the various shards, merge them into a single series, then re-iterate across the buckets to find null values and fill it in. So doing it in your application is no more/less efficient than if ES were to do it...although it is a bit less user-friendly

Hi Zachary, thanks for replying. I feared as much :)... but at least now I know...

Is it possible to do a bulk query? i.e. instead of hitting the ES server with multiple request, could I compile them all and send them in one go? Or am I dreaming :confused:

Thanks!

If they are independent queries (e.g. one query doesn't need part of the results from a previous query), you can batch them together in one call with the MSearch API. Follows a similar format to the Bulk API.

The queries are executed in parallel, and you'll get back a big array of results (one object for each search request, holding all the normal hits and metadata).

It won't save any time with regards to the actual query processing... that's the same regardless. But it will save on network overhead and roundtrips, so if you're consistently firing off multiple queries it can give a nice boost "for free" :slight_smile: