My SQL is rusty, but this is a fairly close approximation:
{
"query": {
"filtered": {
"filter": {
"range": {
"session_time": {
"gte": 1441652400,
"lte": 1442257200
}
}
}
}
},
"aggs": {
"user_country_agg": {
"terms": {
"field": "user_country"
},
"aggs": {
"game_name_agg": {
"terms": {
"field": "game_name"
},
"aggs": {
"proxy_country": {
"terms": {
"field": "proxy_country"
},
"aggs": {
"game_server_country_agg": {
"terms": {
"field": "game_server_country"
},
"aggs": {
"total_sessions": {
"value_count": {
"field": "user_country"
}
},
"positive_sessions": {
"sum": {
"script" : "doc['avg_ping'].value < avg_line_ping ? 1 : 0"
}
},
"negative_sessions": {
"sum": {
"script" : "doc['avg_ping'].value > avg_line_ping ? 1 : 0"
}
}
}
}
}
}
}
}
}
}
}
}
Note that it doesn't perform sorting, so if that is critical you may need a different solution. Currently, ES doesn't let you sort a terms
agg based on the value of a multi-valued child agg (e.g. an embedded terms
).
Also note that deeply embedded terms
aggs can quickly cause problems due to high branching factor. E.g. if each terms
agg has 50 terms, and you have four levels, that is 50^4 = 6,250,000
buckets. You may need to use a different collect_mode
to prevent memory problems
If sorting is critical, you could try something like this instead:
{
"query": {
"filtered": {
"filter": {
"range": {
"session_time": {
"gte": 1441652400,
"lte": 1442257200
}
}
}
}
},
"aggs": {
"user_country_agg": {
"terms": {
"script" : "doc['user_country'].value + '_' + doc['game_name_agg'].value + '_' + doc['proxy_country'].value + '_' + doc['game_server_country_agg'].value",
"order": {
"total_sessions": "desc"
}
},
"aggs": {
"total_sessions": {
"value_count": {
"field": "user_country"
}
},
"positive_sessions": {
"sum": {
"script" : "doc['avg_ping'].value < avg_line_ping ? 1 : 0"
}
},
"negative_sessions": {
"sum": {
"script" : "doc['avg_ping'].value > avg_line_ping ? 1 : 0"
}
}
}
}
}
}
E.g instead of nesting aggregations, you build your grouping key with a script by concatenating all the values together. Then you can sort on total_sessions
since it is a single-valued agg
As general advice, directly translating SQL to Elasticsearch is not always the best idea. They operate on different principles, so you may have a hard time or bad performance trying to treat ES as a relational store.