Leaderboard Style Aggregation Query Question


(Maxwell Talbot) #1

Hi All

I am trying to get a query that provides a leaderboard across multiple games and a is flexible in terms of time range.

I have an index with scores where each document looks like this

{ timestamp: 2015-09-01 00:00:01, score: 56, game: 10, user: 5}
{ timestamp: 2015-09-01 00:00:02, score: 26, game: 10, user: 5}
{ timestamp: 2015-09-01 00:00:03, score: 83, game: 11, user: 5}
{ timestamp: 2015-09-01 00:00:04, score: 2, game: 10, user: 5}
{ timestamp: 2015-09-01 00:00:05, score: 2, game: 10, user: 1}
{ timestamp: 2015-09-01 00:00:06, score: 2, game: 10, user: 1}
{ timestamp: 2015-09-01 00:00:07, score: 2, game: 10, user: 1}
{ timestamp: 2015-09-01 00:00:08, score: 2, game: 10, user: 1}
{ timestamp: 2015-09-01 00:00:09, score: 2, game: 10, user: 1}

ie. the sum of top scores for each game is your 'total' score. the leaderboard should return users based on this total score. so for user 5 his score should be 139, and user 1 would have a score of 2.

so far I've come up with something along these lines

{
    "query":{
        ...
    },
    "aggregations": {
        "users":{
            "terms": {
                "field":"user_id",
                "size":100
            },
            "aggregations":{
                "games":{
                    "terms": {
                        "field":"game",
                        "size":100
                    },
                    "aggregations":{
                        "top_score":{
                            "max": {
                                "field": "score"
                            }
                        }
                    }
                }
            }
        }
    }
} 

The problem with this query is that it will return the users with the most scores first, so in the example data user 1 would come above user 2.

(I guess i mean problems other than on the number of games it would support and the performance cost of bucketing the bucket making it somewhat quite un elastic)

My next step would probably be to look at a different style of index (for example, if i just needed an all time index i would look into only adding top scores into the index, but the time range gets in the way) or pre-caching leaderboards for given time ranges, or removing the time range flexibility all together.

It also seems like using sum_bucket (still in beta) could help with my problem, as long as we can sort the first bucket on the sum_bucket.

I hope this isnt completely misplaced here, feel free to point me in the right direction if so

I'm new to elasticsearch so i'm not sure if I'm missing something obvious, but it seems likely, anyways thanks for reading this far and any help would be appreciated.

Max


(system) #2