Ordering of Nested Bool Filters


(Harlin) #1

I am sending a nested bool filter to my cluster and the order in which each filter executes is very important. From what I understand the filter that comes first should be executed first but that doesn't seem to be what is happening when I execute my query:

"query" : {
    "filtered" : {
      "filter" : {
        "bool" : {
          "must" : [ {
            "bool" : {
              "should" : [ {
                "term" : {
                  "category" : 64
                }
              }, {
                "term" : {
                  "category" : 65
                }
              } ]
            }
          }, {
            "term" : {
              "identity" : 25914331
            }
          }, {
            "range" : {
              "timestamp" : {
                "from" : 1440057600000,
                "to" : 1440071999999,
                "include_lower" : true,
                "include_upper" : true
              }
            }
          } ]
        }
      }
    }
  }
}

I need the nested bool on the "category" field to execute first, though I am not sure this is what is actually happening. Any insight would be greatly appreciated.

Thank,
Harlin


(Zachary Tong) #2

Actually, the only time order matters is for the and/or/not family of compound filters. They are order dependent because of their internal execution path... which is also why they are usually sub-optimal for performance.

The bool filter operates differently: it internally re-arranges the filters to produce the most efficient execution path. The bool filter basically aligns the various filter bitsets and executes the least expensive one first (determined heuristically, usually the sparsest filter). The bitsets then "leapfrog" their iterators to visit the fewest number of documents as possible.

If you're interested in the technical details, Adrien has a good talk about how conjunctions (boolean combinations) work in Lucene here: https://berlinbuzzwords.de/file/bbuzz-2015-adrien-grand-algorithms-and-data-structures-power-lucene-and-elasticsearch

So, the morale of the story is: don't worry about order :slight_smile: Lucene will choose the fastest/best order for you


(system) #3