Elastic version: 6.3.2
Documents: ~120,000,000
We're noticing very odd behavior with how Elastic appears to be ordering filter/query operations. Given the following query:
POST /index/_search
{
"from": 0,
"size": 5,
"query": {
"bool": {
"must": [
{
"term": {
"customer": "customer-id"
}
},
{
"query_string": {
"fields": [
"filename"
],
"query": "*term*"
}
}
]
}
}
}
Running this query takes ~3.5 seconds.
Removing the customer terms query/filter takes ~3.5 seconds
Removing the query_string query takes 35ms and consists of ~1,500 results (down from 120M total docs in the index)
It seems like the query_string operation is being run before the term filter since it takes 3.5 seconds with or without the term filter.
My question is, why doesn't Elastic apply the term filter first so that the query_string operation (more expensive) only runs on 1,500 documents instead of the full 120M?
Ideas I've had but haven't been able to make work yet:
- Rearranging the query to get the term filter to run before the query_string. I've tried different variations of nesting must/should/bool.
- Passing some type of hint to the query processor so that it knows to run the term filter first. I haven't found any way to do this.
Also, I realize wildcard prefixes cause performance issues, but I don't think that's what's happening here.
Any insight would be appreciated.
Nick