How to sum values per percentile?

Vince1 · May 20, 2020, 11:52am

I have a cluster with millions of documents. Each document represents a music transaction.
I want to calculate percentiles with the field price AND to sum prices by percentiles.

My document looks like

"_source" : {
          "commercialModel" : "SubscriptionModel",
          "creationDate" : 1529307715833,
          "dspCode" : "MUSIC-ONLINE",
          "duration" : 78,
          "price" : 0.57370562886557345,
          "nbTx" : 1,
          "releaseTitle" : {
            "title" : "Se Cacher Sous L'arbre Sous la Pluie (Hiding Under the Tree in the Rain)",
            "title_lower" : "se cacher sous l'arbre sous la pluie (hiding under the tree in the rain)"
          },
          "releaseType" : "TrackRelease",
          }

I can calculate percentiles:

GET /_search
{
    "size": 0,
    "aggs" : {
        "incomes" : {
            "percentiles" : {
                "field" : "price" 
            }
        }
    }
}

But what I need is the sum of the prices per percentile.

polyfractal · May 21, 2020, 3:14pm

So you'd like say, the sum of price field from all documents where the price was in the 95th percentile?

If that's the case, I'm afraid it's not currently possible with one request. We have to do a complete pass over the data to determine the global percentiles. Due to the sharded nature of data in Elasticsearch, the "local" percentile values could be very incorrect. E.g. you might have data from 0-10 on one shard, and 10-100 on a different shard. When viewed locally, the first shard would think the 95th percentile was near 10. But when viewed globally, the 95th percentile is closer to 100.

This means we can't "bucket" documents by percentile because we don't know where the bucket boundaries are until we've consumed all the data.

You can achieve it with two sequential requests though. First to calculate the percentiles, then a second query that uses a range aggregation which defines the different percentiles you care about. Or just a filter query which restricts all documents to those that had a price > 95th percentile.

system · June 18, 2020, 3:14pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Percentile based on aggregated data Elasticsearch	1	371	July 6, 2017
Percentile aggregation with min, max, and count Elasticsearch	1	395	July 5, 2017
Query to get percentiles of a calculated sum value Elasticsearch eql-elastic-query-language	2	467	September 28, 2021
Elastic Search percentile aggregations filtering issue Elasticsearch	3	1108	February 26, 2019
Calculating average of values in a percentile Elasticsearch	3	1072	August 22, 2017

How to sum values per percentile?

Related topics