Percentiles aggregation does not behave as expected

iamredlus · January 20, 2019, 7:52pm

Hey

I've been experimenting with percentile aggregations on some dataset after receiving some odd results in elasticsearch 6.3.2.

My dataset consists of 467 floating point numbers, which can be found here. Just to rule out different implementations of the analyzed values, I've ingested this dataset (each value as a single document) into two indices - one which defines the field as a float, and one which defines the field as a scaled_float with "scaling_factor": 1000. The percentiles requested are "percents": [50, 90, 95, 99, 99.99]. In both cases, the results are very similar and are very different from the percentiles expected (vs. Python's NumPy):

As documented, I know percentiles are approximated. However, it seems that the approximation behaves exactly opposite to what is written in the documentation:

Accuracy is proportional to q(1-q) . This means that extreme percentiles (e.g. 99%) are more accurate than less extreme percentiles, such as the median

For small sets of values, percentiles are highly accurate (and potentially 100% accurate if the data is small enough).

My dataset is pretty small (only 467 samples), and the accuracy increases as we approach the median. How can this be explained?

Christian_Dahlqvist · January 21, 2019, 7:49am

How many shards does your index have? If you are using the default 5 primary shards, can you try with a single primary shard as well?

iamredlus · January 21, 2019, 9:46am

Hey @Christian_Dahlqvist, in both cases it was tested on a single shard.

iamredlus · January 23, 2019, 10:46am

Hey @Christian_Dahlqvist
What else can we check on the issue?

Mark_Harwood · January 23, 2019, 11:23am

Have you tried the settings to balance memory usage and accuracy?

iamredlus · January 23, 2019, 11:48am

Yep, I've experimented with different levels of the compression parameter (up to 2,000), but it does not seem to have any effect (maybe the sample size is small enough such that additional 'nodes' do not add accuracy).

iamredlus · February 12, 2019, 3:22pm

I've opened an issue in github on the subject -

system · March 12, 2019, 3:22pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Aggregations Percentiles ES 6.2.4 VS ES 6.3.2 Elasticsearch	2	446	December 3, 2018
Percentile Rank query giving unexpected results Elasticsearch	2	488	April 15, 2021
Are aggregation results accurate except cardinality? Elasticsearch	1	427	April 2, 2018
Performance of percentile aggregations Elasticsearch	2	447	July 5, 2017
Percentile Aggregation: Help me understand "Accuracy is proportional to q(1-q)." Elasticsearch	9	1176	July 5, 2017

Percentiles aggregation does not behave as expected

Related topics