I'm not 100% sure what you are after but if you want average number of clicks per user then that would be the total number of clicks recorded divided by the number of unique users. Both of these numbers are returned in the results of your last example query in the JSON fields hits/total and aggregations/uniques user/value
I know how to process them outside of elasticsearch, if I send both queries I have mentioned above.
What I'm trying to do is the following :
Let's consider that a user I made a number of clicks Y_i what means that we have Y_i observations on the userId = I.
What I'm trying to compute is the density of the user X clicks Y upon all the observations.
I compute Y_i with the terms aggregations for a specific user I, described above.
U is the number of unique users computed with the second cardinality aggregation.
I am looking want to compute Y_i / U for the top 10 users.
U is the number of unique users
Y_i / U for the top 10 users.
U here is a constant. So you want to scale all of the doc_counts reported for each of the top 10 users by this constant?
I don't get why this would be useful. It's like finding the top 10 vehicles with the most miles on the clock and then dividing these numbers by the total number of car manufacturers. It does not look to serve any purpose.
Can we start with stating the business problem you are trying to solve rather than how you intend to solve it?
I'm not interested in the business value
I want to compute the doc_count / number of hits where the keys are the userId.
This changes the previous definition: Y_i / U is not the same as doc_count / number of hits
U can move over time, that's why I don't want to consider it as a constant.
I meant it can be considered as a constant for the purposes of your single request. It's equivalent to saying "I want to multiply all reported doc_counts by 0.234235" - it is an arbitrary fixed boost applied to rebase all doc_count values and does nothing to change the ranking order used to select the top 10 users.
We do see real-world examples of using elasticsearch on click data and there are powerful analysis techniques available but unfortunately they do not extend to your example of re-basing doc_count numbers for display purposes.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.