Good afternoon
We are struggling to debug an aggregation we are running.
index mapping:
tags_keyword: {
type: "keyword"
}
where tags_keyword is an array of lowercased keyword terms
query:
{
"size": 0,
"query": {
"bool": {
"filter": [
{ "term": { "doc_type": "product" } },
{ "term": { "shop_id": 1 } }
]
}
},
"aggs": {
"tags": {
"terms": {
"field": "tags_keyword"
}
}
}
}
We are trying to get a top 10 tags for a particular filtered query. We are using the filter clause to reduce the data set, and the aggs clause to get the aggregation. Additionally, there is routing attached to this request, and it enforces running this query against one shard.
The weird; if we run just the query clause, and no aggs, we get responses in the 20ms timeframe. If we add the aggs clause, we are getting responses in the 16-20s timeframe. Its as if the aggregation runs against the entire shard's data (13 million records), and then being filtered, instead of first filtering down the result set (816 records) and then aggregating upon it.
We tried a filtered aggregation query - removing the entire query clause and running aggs.filter with them inside instead - same result.
like so:
GET /products/_search?routing=1
{
"size": 1,
"profile": true,
"aggs" : {
"t_shirts" : {
"filter" : { "term": { "shop_id": 1 } },
"aggs": {
"tags": {
"terms": {
"field": "tags_keyword"
}
}
}
}
}
}
Here is the very curious part - removing the routing from the URL and running performance in kibana shows each shard's max response time in 1.7s, not 16-20s. its only when all of the shards runtimes are added together, do we add the 16-20s.
I could literally write a loop to aggregate on tags of 816 that would perform faster than 20s. It must be not pre_filtering, is this an ES bug?
What is going on here?