Getting doc_count in percentiles aggregation

Hi,
I would like to get the doc_count value in percentiles aggregation, e.g.:
GET /my_index/_search
{
"query": {
...
},
"aggregations" : {
"my_agg" : {
"percentiles" : {
"script" : "_score",
"percents" : [85],
"keyed" : "false"
}
}
}
}
would return the following response:
...
"aggregations": {
"my_agg": {
"values": [
{
"key": 85,
"value": 2.674367666244506
}
]
}
}

that shows the score of 85 percentiles point = 2.674367666244506. If it returns the "doc_count" together with the value, I could get the number of my search hits which is above the 85 percentiles.
I tried to do some pipe aggreation or sub-aggregation to get the number and all ended in the error.
Any idea if this is achievable?
Thanks!

Yifeng

Hi @amyc,

The document count in percentile aggregation is not available. It's not clear why is it necessary to know the number of documents as 80th percentile, for example, is the value which is greater than 80% of the observed values.

If you really need a number you could guess it doing some math using the total hits of the search result, but please be aware that percentile are approximate.

Cheers,
LG

@luiz.santos:
Thanks for you reply.
My use case is to use percentiles aggregation for filtering out documents based on percentiles of the max score (not the _min_score), e.g. filtering out documents having score below 80%. (See comment on issue #719). If I have the document count together with the 80% percentiles score value, I am able to achieve this by returning the correct total hits (using this document count), and looping the returned hits, removing document with the score below the percentile value. This allows me to do it with one pass. Without it, I have to run a second pass.

Yifeng

Search filters such as min_score are applied locally to a shard in phase 1 of a search. The global range of scores produced using percentiles is not known until phase 1 results are merged from all shards. Therefore you will need 2 passes to do this sort of logic.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.