Backward aggregation use case


(Nicolas Froidure) #1

We want to find a day in the past (from now) where a sum aggregation reach a given ceil. Is there a way to do this without retrieving the whole data aggregated per day with a large enough range to ensure that the ceil will be reached ?


(Nik Everett) #2

You can use a pipeline aggregation to prevent having to pull back all of the data into your application but it will still have to perform the huge aggregation under the covers to test. We don't have anything to optimize such a thing, partly because the iteration order on a particular shard is usually random and partly because we don't expect to be able to perform all of the operations for a particular aggregation on a single shard. Usually we have to merge results from many shards to look at the sum.


(Nicolas Froidure) #3

Thanks for replying, I thought about pipeline but were not sure of the ability to access the keys of the pipe lined aggregations since to get the date I'll have to sum up all the buckets of a day date histogram backward and pick up the key of the bucket to know the date I'm looking for.

The thing is that in that process, I may need to iterate several time on the aggregations to avoid to retrieve the whole bunch of data since by definition, I do not know when the algorithm should stop.

I understand this is a very specific need though and I didn't expect you to provide a straightforward way to solve this. Thanks anyway!


(Nik Everett) #4

There are almost certainly "scary" things you can do with the scripted aggregate and routing values that'd get the job done so long as you can make super-sure that all of the things you needed to be on a single shard were routed to a single shard. We don't tend to implement aggregations like this because we want things that scale horizontally and any solution involving routing specific days to specific shards won't scale well, but you can build it with scripted aggregations. It just, well, might be scary to implement and maintain. But it'd be fast.

Your other option I think is to search composite aggregations to walk the output of aggregations and do fancy things in your own application. That'd be slower than anything really smart, but way faster than pulling all of the document back into your application. _source loads are generally much slower than aggregations because of the way that things are stored.