Backward aggregation use case

nfroidure · November 29, 2018, 3:05pm

We want to find a day in the past (from now) where a sum aggregation reach a given ceil. Is there a way to do this without retrieving the whole data aggregated per day with a large enough range to ensure that the ceil will be reached ?

nik9000 · November 30, 2018, 5:41pm

You can use a pipeline aggregation to prevent having to pull back all of the data into your application but it will still have to perform the huge aggregation under the covers to test. We don't have anything to optimize such a thing, partly because the iteration order on a particular shard is usually random and partly because we don't expect to be able to perform all of the operations for a particular aggregation on a single shard. Usually we have to merge results from many shards to look at the sum.

nfroidure · December 3, 2018, 10:04am

Thanks for replying, I thought about pipeline but were not sure of the ability to access the keys of the pipe lined aggregations since to get the date I'll have to sum up all the buckets of a day date histogram backward and pick up the key of the bucket to know the date I'm looking for.

The thing is that in that process, I may need to iterate several time on the aggregations to avoid to retrieve the whole bunch of data since by definition, I do not know when the algorithm should stop.

I understand this is a very specific need though and I didn't expect you to provide a straightforward way to solve this. Thanks anyway!

nik9000 · December 3, 2018, 3:06pm

There are almost certainly "scary" things you can do with the scripted aggregate and routing values that'd get the job done so long as you can make super-sure that all of the things you needed to be on a single shard were routed to a single shard. We don't tend to implement aggregations like this because we want things that scale horizontally and any solution involving routing specific days to specific shards won't scale well, but you can build it with scripted aggregations. It just, well, might be scary to implement and maintain. But it'd be fast.

Your other option I think is to search composite aggregations to walk the output of aggregations and do fancy things in your own application. That'd be slower than anything really smart, but way faster than pulling all of the document back into your application. _source loads are generally much slower than aggregations because of the way that things are stored.

system · December 31, 2018, 3:06pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
New to ES/Compare daily data to previous date's data Elasticsearch	7	5239	May 1, 2018
Aggregation on previous aggregation's result Elasticsearch	7	1269	December 25, 2019
Date Histogram buckets using top_hits of terms over last 30 days Elasticsearch	2	1068	March 11, 2020
Query Optimization Elasticsearch	2	442	November 4, 2020
Time Date: Giant Index w/Shard Routing VS Small Indices w/Little Shards and Aliasing Elasticsearch	3	446	July 6, 2017

Backward aggregation use case

Related topics