Beginner - How to aggregate count on large cardinality?

In a game setting.
Imagine you have 50 levels.
You want to get the distribution (averages, standard deviation, max, min, etc.) of attempts per player for each level.
I have a play_level_end event with a unique_id parameter that uniquely identifies each player.
I first aggregate with terms on the levels
"aggs": {
"levels": {
"terms": {
"field": "e.i.level",
"size": 100
}
but then I'm stuck. I want to split by unique_id and count the number of docs, but I can't figure out how to bucket per unique user.
Any idea?
I do not want to average total number of attempts per unique count of players. as doing this I would lose key information on the distribution.
The histogram agg will not take my unique_id, and terms well, will refuse my high cardinality.
Thank you for any help to a noob, it's good for your karma :wink:

Behavioural analysis like this often requires an entity-centric index. Most elasticsearch indices are like your example - event-centric. This link should help explain what the issues are and how to make an entity centric index from your logged events.

1 Like

Hello from Stockholm! and nice talk!

Got it. This intelligence needs to be baked in at our logs ingesting scripts and cannot be obtained on raw data from aggregations.
Of course, the alternative is for the reporting client to add more info itself: on every event, add duration since session start, on every level attempt add attempt number, etc.

Thx!

Hello from London, and thanks!

Yep that would make sense. What you want to avoid is generating multiple records saying "player 1 made 1 attempt" and then a different document saying "player 1 made 2 attempts" because then you'd have a de-duplication issue to fix. Make sure when you index the document you provide an id for the document which is a combination of player ID and level so that you are inserting or updating the same document rather than just generating new ones.

Thanks for the tip!

Though in our case, there's some qualitative data that we want to keep for each attempt. For example, the score. We wouldn't want to update score with each attempt.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.