Percentile Rank query giving unexpected results

Jonathan_Tan · February 23, 2021, 10:08pm

Hi,

I'm getting unexpected results for the percentile_rank aggregation, and was looking for some help/clarification.

I've got 9 documents, all of which have a position field.
The position values for the 9 documents are:

[1, 1, 1, 1, 2, 2, 3, 4, 8]

Now when I do the following percentile_rank aggregate

{
  "size": 0,
  "aggs": {
    "percentile_test": {
      "percentile_ranks": {
        "field": "position",
        "values": [
          1,
          3
        ]
      }
    }
  }
}

The documentation at Percentile ranks aggregation | Elasticsearch Guide [8.11] | Elastic says that

Percentile rank shows the percentage of observed values which are below certain value

implying that for my value of 1, I should actually get a value of 0, since I have no documents with a position of less than 1.

Instead, I get

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 9,
      "relation": "eq"
    },
    "max_score": null,
    "hits": []
  },
  "aggregations": {
    "percentile_test": {
      "values": {
        "1.0": 33.33333333333333,
        "3.0": 72.22222222222221
      }
    }
  }
}

which states that my percentile rank for documents having a position of less than 1, is 33.33%. i.e., a third of my documents.
Similarly, with 6 out of 9 documents having a value of less than 3, I'd expect the percentile rank for 3 to be 66.66%. Instead, It's coming back as 72.22%.

I'm aware of the caveat in Percentiles aggregation | Elasticsearch Guide [8.11] | Elastic talking about how percentiles are approximate, however this seems a bit weird. I mean, in this case, I'm asking for the percent of values under the smallest value:

there's nothing that is smaller, so why is that ever coming up with a value > 0?
the same link says that for small document sets, it is highly accurate, potentially even being 100% accurate. And I think 9 documents is a very very small data set...

Can anybody provide some help/answers/opinions/perspective please?

I also note that similar questions about percentile rank have been asked in the past, without any answers, so clearly this is something that others have found puzzling

Thanks in advance!

Jonathan_Tan · March 18, 2021, 12:17am

Bump?
Nobody else able to respond to this with some experience?

system · April 15, 2021, 12:18am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Percentile ranking question Elasticsearch	1	446	July 5, 2017
Question about percentile rank query Elasticsearch	1	340	October 3, 2019
Percentile_ranks for values in database Elasticsearch	5	652	May 31, 2018
Percentile ranks return wrong value Elasticsearch	1	551	April 13, 2020
Percentiles aggregation does not behave as expected Elasticsearch	7	921	March 12, 2019

Percentile Rank query giving unexpected results

Related topics