How to sum values per percentile?

I have a cluster with millions of documents. Each document represents a music transaction.
I want to calculate percentiles with the field price AND to sum prices by percentiles.

My document looks like

"_source" : {
          "commercialModel" : "SubscriptionModel",
          "creationDate" : 1529307715833,
          "dspCode" : "MUSIC-ONLINE",
          "duration" : 78,
          "price" : 0.57370562886557345,
          "nbTx" : 1,
          "releaseTitle" : {
            "title" : "Se Cacher Sous L'arbre Sous la Pluie (Hiding Under the Tree in the Rain)",
            "title_lower" : "se cacher sous l'arbre sous la pluie (hiding under the tree in the rain)"
          "releaseType" : "TrackRelease",

I can calculate percentiles:

GET /_search
    "size": 0,
    "aggs" : {
        "incomes" : {
            "percentiles" : {
                "field" : "price" 

But what I need is the sum of the prices per percentile.

So you'd like say, the sum of price field from all documents where the price was in the 95th percentile?

If that's the case, I'm afraid it's not currently possible with one request. We have to do a complete pass over the data to determine the global percentiles. Due to the sharded nature of data in Elasticsearch, the "local" percentile values could be very incorrect. E.g. you might have data from 0-10 on one shard, and 10-100 on a different shard. When viewed locally, the first shard would think the 95th percentile was near 10. But when viewed globally, the 95th percentile is closer to 100.

This means we can't "bucket" documents by percentile because we don't know where the bucket boundaries are until we've consumed all the data.

You can achieve it with two sequential requests though. First to calculate the percentiles, then a second query that uses a range aggregation which defines the different percentiles you care about. Or just a filter query which restricts all documents to those that had a price > 95th percentile.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.