I'm observing some interesting behavior with boolean term queries on Elasticsearch that I'd like to understand further.
Each document in the index has several terms under the section ev_tags.
I'm issuing queries like below and I am observing an interesting trend that is quite surprising and I would like to understand it.
- All the terms in the query match atleast one document in the index. When this is the case, the query latency increases as the number of terms increases (while keeping the number of matched documents same). This is expected behavior as there are more posting lists to go through.
- When all the terms in the query don't match any documents, the query finishes really fast with 0 documents - as the terms are not found in the inverted index.
- This is the case that surprised me. When I added terms that matched zero documents to the terms that matched documents, the query latency INCREASED. For instance, I issued a term query with 100 terms (all having matching documents). I then added 1000 non-matching codes to the query and this increased the latency significantly. Given the query performance for case 2, I assumed that The 1000 codes would be immediately thrown away as they are not in the inverted index and the query perf should be same as if I had 100 terms in the query.
Any ideas on what can explain this behavior? Any help is much appreciated.
I'm using Elasticsearch 7.16.3
Query
{
"_source": false,
"stored_fields": [
"field1",
"field2",
"field3"
],
"from": 0,
"size": 10,
"sort": [
"_doc"
],
"query": {
"constant_score": {
"filter": {
"bool": {
"must": [
{
"term": {
"slice": "0"
}
},
{
"term": {
"ev_tags": "<ev_term_1>"
}
},
{
"term": {
"ev_tags": "<ev_term_2>"
}
},
{
"bool": {
"should": [
{
"term": {
"ev_tags": "<ev_term_3>"
}
}
]
}
}
]
}
}
}
}
}