I'm running a very simple query which uses range on @timestmap field (Type: date).
From some reason I see using Profile API than it is rewritten to multiple TermQuery on this field.
I was wondering why is that happening and is it suppose to be faster than range query?
Good question! So this is due to some internal optimizations that Lucene makes. The summary can be found in the comment header of MultiTermQueryConstantScoreWrapper:
It tries to rewrite per-segment as a boolean query
that returns a constant score and otherwise fills a
bit set with matches and builds a Scorer on top of
this bit set.
Basically, the range is evaluated on each individual segment. If the segment only holds a small number of matching terms (16 or less), it rewrites the range into a boolean of individual terms. If the segment matches a larger number of terms, it generates a bitset and iterates over that as a "normal" range.
The reason is down to speed: generating a bitset for all the documents in an index takes a certain amount of time. If there are not many terms to evaluate (which we can determine based on the term-dictionary for the segment), it's faster to skip the bitset generation and just check the terms individually with a boolean.
But booleans slow down as there are more terms to evaluate, so at some point it makes sense to pay the cost of building the bitset, because we'll make up the time during the range evaluation because there are many terms to check.
If you were to re-run your profile where each segment is matching many terms, you'll see the output change.
Also note: in 5.0+, the lucene output is much friendler. It won't spam a bunch of binary terms, but instead show a simple [0 TO 10] output
Hm, I have the problem that this behavior causes an otherwise simple query to immediately overflow my search queue (capacity 1000) and then causing 4000 rejections - making the whole system unusable for a while.
I think you're encountering a different, unrelated problem. The query expansion/rewrite process is still occurring in a single search context... e.g. under a single thread. The process described above won't fill up your search queue.
The search queue is filling up due to multiple concurrent queries that are being executed, not because of one query that is "expanding" to multiple search contexts. I'd suggest opening a thread about your problem to get more help, since it's likely unrelated to this thread.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.