Hello,
I use ElasticSearch 2.3.
I observed that with this simple query order of must
and must_not
conditions in query JSON matter:
{
"query": {
"bool": {
"filter": {
"bool": {
"must": { "term": { "group_id": 1 } },
"must_not": {
"regexp": { "category": { "value": ".*value.*", "flags": "NONE" } }
}
}
}
}
}
}
When must
goes first, query is more than 10 times faster than if must_not
key goes first in bool object. I think it makes sense given that must
condition reduces amount of documents in question significantly and then filtering them by regexp
is much faster as opposed to first filter by regexp
and then by group_id
.
Now when I try to run same query for another group_id (e.g. group_id: 2
) this optimisation doesn't work. Regardless of must
and must_not
order it takes roughly the same amount of time – approximately equal to the time taken by the first unoptimised query for the first group.
Number of documents in both groups is of the same magnitude and the resulting output is similar in a way that most of records from the group are returned and only small percentage of them are filtered out because of the must_not
condition.
My goal now is to compare query execution profiles for both groups and figure out what makes group_id = 1
so special that optimisation is so helpful. And eventually figure out how to make optimisation work for the second group (and all other groups).
I ran this query for both groups with profile: true
but couldn't find no information that would help me figure this out.
I checked Lucene query string produced for the second group and regardless of whether regexp
condition is before group_id
condition or not query takes roughly the same time.
Do you have any ideas how I can figure out what makes the difference between those groups and eventually make this query faster for the second group?
Thank you in advance!