What is the order of evaluation of the TermsFilter (or the newer TermsQuery) when multiple TermsFilter are chained together?

swamirajamohan · December 9, 2015, 8:14am

for eg.     should { termsquery1, termsquery2}

And what is the order of evaluation in the event that the above TermsFilter(TermsQuery) is nested within another Filter?

for eg.    must{ termsquery2, should{ termsquery1,termsquery2}}

I would like to know if the order in which the terms are specified will matter with regards to query performance?

Specifically having termsquery2 as a filter, will it speed up evaluation of termsquery1 & termsquery2 owing to the fact termsquery1 & termsquery2 will operate on a smaller subset of the corpus.

In my use case.

Termsquery1 -> Large set of terms
Termsquery2 -> smaller set of terms 
and TermsQuery1 >> TermsQuery2 (greatly exceeeds)

jpountz · December 10, 2015, 6:05pm

The way queries work is by creating a Scorer that iterates in doc ID order over matching documents. A regular disjunction (the one you could create with a bool query) uses a heap in order to merge several iterators into a new iterator on the fly. However when there are many terms, the fact that the priority queue needs to be rebalanced on each iteration can make things very slow. This is why TermsQuery uses a different approach: instead of creating an iterator that will merge the sub iterators on the fly, it creates a bit set and then consumes all iterators and sets bits in the bit set for every visited document. Then it returns an iterator over the bit set.

This approach helps when there are many terms but it has a major downside: it needs to consume all doc IDs from all sub iterators all the time. This means that no matter if you intersect a terms query with a selective query or not, it will always do the same amount of work. Lucene protects against this problem by rewriting TermQuery instances to a ConstantScoreQuery around BooleanQuery when there are less than 16 terms (conservative heuristic), but when you have many terms there is not much you can do about it.

So to answer your question, the order does not matter when chaining, in all cases all terms queries will be evaluated against all documents from the index.

swamirajamohan · December 11, 2015, 8:18pm

Thanks for the explanation @jpountz. That helped a lot. I was suspecting that my use case will fall under a BitSet approach and not give any advantage via the filtering because of the number of terms involved.

Topic		Replies	Views
Inconsistent Term and Terms query resolution in filter context Elasticsearch	1	298	October 5, 2022
Filter performance difference (Bool vs Terms) Elasticsearch	2	4324	July 5, 2017
Can the order of filters impact performance? Elasticsearch	5	1916	July 6, 2017
Term vs terms filter Elasticsearch	5	16472	July 5, 2017
termQuery vs. termFilter Elasticsearch	6	665	July 6, 2017

What is the order of evaluation of the TermsFilter (or the newer TermsQuery) when multiple TermsFilter are chained together?

Related topics