Average of top n terms

roel · July 23, 2015, 2:18pm

Hallo,

In a certain index documents have a keyword, a rank and a timestamp. The rank for a keyword may differ from time to time. This means the dataset may look like this:

{"keywords": "piano", "rank" 1, "timestamp": 1437642812}
{"keywords": "piano", "rank" 2, "timestamp": 1437642813}
{"keywords": "electric guitar", "rank" 5, "timestamp": 1437644326}

I would like to get the average rank of the top 500 most occuring keywords. But I cannot find out how to do this. My current try-outs seem to always give the average for the entire dataset.

Roel

colings86 · July 23, 2015, 2:29pm

In the current version this is not possible, but with pipeline aggregations coming in version 2.0 you will be able to use the avg_bucket aggregation to do this: https://www.elastic.co/guide/en/elasticsearch/reference/master/search-aggregations-pipeline-avg-bucket-aggregation.html

In the mean time you would need to do an aggregation for the top 500 terms and perform the average calculation on the client side

roel · July 23, 2015, 2:59pm

Thank you for your answer.
I imagine this would work for a normal script, but is this also possible when I want to use the data for Kibana?

Roel

colings86 · July 23, 2015, 3:19pm

Yes this would work in 2.0 for requests straight to Elasticsearch. However, it will take some time for the functionality to be added to the Kibana interface. It is something the Kibana team are thinking about how to add though