Hello,
I need to get the top-hits over an aggregation.
In my use case I store SQL queries as documents (with different statistics like the time it tooks). Each querie is hashed, so that I can make an aggregation term on it.
I then have "groups" of exact same sql queries, on which I can compute aggregations, like the average time each query took.
Now I want to get the top-k queries w.r.t mean time it tooks. How can I do so ?
If i use a bucket_sort, I know that elastic will sort buckets after having extracting them, and because the number of bucket is limited, I do not have the guarantee to have top-k on this aggregation metric.
This is the code I came with so far:
GET /sql_requests_parsed/_search/
{
"size": 0,
"query": {
"bool": {
"must": [
{
"exists": {"field": "request"}
}
]
}
},
"aggs": {
"same_requests": {
"terms": {
"field": "hashed_request",
"size": 2000
},
"aggs": {
"request": {
"top_hits": {
"size": 1
}
},
"mean_call": {
"avg": {
"field": "callDuration"
}
},
"sorting": {
"bucket_sort": {
"sort": [
{"mean_call": {"order": "desc"}}
]
}
}
}
}
}
}
NB: I also tried "collapse", which seemed to fit my usecase at first glance, but I can't aggregate on the resulting inner hits.