ES Rewriting range to @timestamp to BooleanQuery / TermQuery - Why?

Hi,

I'm running a very simple query which uses range on @timestmap field (Type: date).
From some reason I see using Profile API than it is rewritten to multiple TermQuery on this field.
I was wondering why is that happening and is it suppose to be faster than range query?

Query
{
"profile": "true",
"query": {
"filtered": {
"query": {
"query_string": {
"analyze_wildcard": true,
"query": "*"
}
},
"filter": {
"bool": {
"must": [
{
"range": {
"@timestamp": {
"gte": "1468829704652",
"lte": "1469434504652"
}
}
}
],
"must_not": []
}
}
}
},
"size": 500,

  "fields": [
    "*",
    "_source"
  ],
  "script_fields": {}
}

Query API shows many of this:

  {
    "query_type": "BooleanQuery",
    "lucene": "@timestamp:0 \u0000\u0000\nX0o @timestamp:0 \u0000\u0000\nX0p @timestamp:0 \u0000\u0000\nX0q @timestamp:0 \u0000\u0000\nX0r @timestamp:0 \u0000\u0000\nX0s @timestamp:0 \u0000\u0000\nX0t @timestamp:0 \u0000\u0000\nX3,",
    "time": "0.5339960000ms",
    "breakdown": {
      "score": 0,
      "create_weight": 42867,
      "build_scorer": 33021,
      "match": 0,
      "advance": 0,
      "next_doc": 349037
    },

Any idea?

Good question! So this is due to some internal optimizations that Lucene makes. The summary can be found in the comment header of MultiTermQueryConstantScoreWrapper:

This class also provides the functionality behind

  • {@link MultiTermQuery#CONSTANT_SCORE_REWRITE}.
  • It tries to rewrite per-segment as a boolean query
  • that returns a constant score and otherwise fills a
  • bit set with matches and builds a Scorer on top of
  • this bit set.

Basically, the range is evaluated on each individual segment. If the segment only holds a small number of matching terms (16 or less), it rewrites the range into a boolean of individual terms. If the segment matches a larger number of terms, it generates a bitset and iterates over that as a "normal" range.

The reason is down to speed: generating a bitset for all the documents in an index takes a certain amount of time. If there are not many terms to evaluate (which we can determine based on the term-dictionary for the segment), it's faster to skip the bitset generation and just check the terms individually with a boolean.

But booleans slow down as there are more terms to evaluate, so at some point it makes sense to pay the cost of building the bitset, because we'll make up the time during the range evaluation because there are many terms to check.

If you were to re-run your profile where each segment is matching many terms, you'll see the output change.

Also note: in 5.0+, the lucene output is much friendler. It won't spam a bunch of binary terms, but instead show a simple [0 TO 10] output :slight_smile:

Hm, I have the problem that this behavior causes an otherwise simple query to immediately overflow my search queue (capacity 1000) and then causing 4000 rejections - making the whole system unusable for a while.

Is there any way to disable this feature?

I think you're encountering a different, unrelated problem. The query expansion/rewrite process is still occurring in a single search context... e.g. under a single thread. The process described above won't fill up your search queue.

The search queue is filling up due to multiple concurrent queries that are being executed, not because of one query that is "expanding" to multiple search contexts. I'd suggest opening a thread about your problem to get more help, since it's likely unrelated to this thread.