I am trying to measure the number of users from different countries by reading the events_log_index ES, this index has all the user interactions from a mobile. Therefore the user might have several records in the event index.
some of the fields in the mapping include:
- sid: session_id
- uid: user_id
- location: nested object containing the users country and city
{
"properties": {
"city": {
"type": "keyword"
},
"country_long": {
"type": "keyword"
},
"country_short": {
"type": "keyword"
},
"ip": {
"type": "keyword"
},
"loc": {
"type": "geo_point"
},
"region": {
"type": "keyword"
}
}
}
I want to know the number of unique users from different countries and then different cities in these countries, so to get the number of users from different countries I used the following query:
{
"size": 0,
"query": {
"match_all":{}
},
"aggs" : {
"users_per_country" : {
"terms" : {
"field": "location.country_long",
"size": 2,
"order": {
"_count": "desc"
}
},
"aggs": {
"distinct_users": {
"cardinality": {
"field": "uid"
}
}
}
}
}
}
The results were as follows:
{
"took": 3702,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 10000,
"relation": "gte"
},
"max_score": null,
"hits": []
},
"aggregations": {
"users_per_country": {
"doc_count_error_upper_bound": 14782,
"sum_other_doc_count": 847455,
"buckets": [
{
"key": "Egypt",
"doc_count": 35337333,
"distinct_users": {
"value": 434430
}
},
{
"key": "Jordan",
"doc_count": 4649252,
"distinct_users": {
"value": 65504
}
}
]
}
}
}
So then I wanted to check for each city in one of these countries the distinct number of users, so I used:
{
"size": 0,
"query": {
"term": {
"location.country_long": "Egypt"
}
},
"aggs" : {
"users_per_country" : {
"terms" : {
"field": "location.city",
"size": 5,
"order": {
"_count": "desc"
}
},
"aggs": {
"distinct_users": {
"cardinality": {
"field": "uid"
}
}
}
}
}
}
the result was as follows:
{
"took": 3034,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 10000,
"relation": "gte"
},
"max_score": null,
"hits": []
},
"aggregations": {
"users_per_country": {
"doc_count_error_upper_bound": 236846,
"sum_other_doc_count": 10455532,
"buckets": [
{
"key": "Al Jizah",
"doc_count": 8552318,
"distinct_users": {
"value": 222287
}
},
{
"key": "Cairo",
"doc_count": 6923425,
"distinct_users": {
"value": 188695
}
},
{
"key": "Al Khankah",
"doc_count": 4539083,
"distinct_users": {
"value": 151461
}
},
{
"key": "Alexandria",
"doc_count": 3901927,
"distinct_users": {
"value": 125361
}
},
{
"key": "Al Mansurah",
"doc_count": 965048,
"distinct_users": {
"value": 36865
}
}
]
}
}
}
The main issue is that comparing the total number of users from Egypt to the sum of users in each bucket from the city does not add up, so you can please advise how to construct this query to get the desired result?