What is the preferred way to run a filter-only query in 2.x?


#1

I'm currently migrating from 1 to 2 and trying to wrap my head around the merging of filters and queries. Based on the documentation here, I see that in 2.x, when trying to execute a query that includes both a query and a filter, these should be wrapped into a bool query with the query as the must clause and the filter as the filter clause. So if I only have a filter (assuming it is a bool filter, as those existed in 1.x), with no query, what is the preferred way to do that?

  • Wrap the filter into another bool filter as the must clause.
  • Wrap the filter into another bool filter as the filter clause (with no must clause).
  • Pass the filter as the top-level query since the "filter" is actually a bool query now.

How are these options different (or are they the same), with respect to results returned, performance, idiomatic-ness, etc.?

I am currently using the third approach, mostly because it was the path of least resistance. However, since my filter construction logic was written based on Elasticsearch 1.x, all of the subclauses are under must, rather than filter. Does that mean by executing this filter as a top-level query, the subfilters are actually being executed in query context? Do I need to go and change every single must in all of my nested bool filters to filter? Apologies if this question has already been answered elsewhere. Thanks.


(Michael McCandless) #2

The best way to think of filter is that it's really a must query with score of 0. There really is no such thing as a "filter" anymore. BQ only separates must and filter because filter means "do not alter the scores".

Lucene generally will rewrite a single-clause BQ nested into another BQ, so I suspect all three of your options are executed the same: same performance, same scores.

I would suggest you just do the 3rd option since it's the most straightforward.

It is worth changing must to filter in your app, if scoring is not important, because Lucene optimizes this, e.g. by not bothering to load/decode per-doc term frequency information when scoring.

Mike McCandless


#3

I suspect all three of your options are executed the same: same performance, same scores. I would suggest you just do the 3rd option since it's the most straightforward.

--

It is worth changing must to filter in your app, if scoring is not important, because Lucene optimizes this, e.g. by not bothering to load/decode per-doc term frequency information when scoring.

I'm a little confused - don't these two statements contradict each other? And on the second point, do you mean changing from must to filter in each of the nested filters (sorry, BQs - it'll take me a while to get used to that :slight_smile: ), or just top-level?


(Michael McCandless) #4

Oh, sorry, I mis-understood your option 3. If you have only a single filter you want to run, you should just put that into a single bool query, e.g.:

{
    "bool" : {
        "filter": {
            ...
        }
    }
}

This is more efficient than just using your query directly (your option 3) since Lucene will avoid computing scores. I think this was your option 2.

Sorry for the BQ acronym!

It should be sufficient to only change to filter in the top-level bool query: Lucene will take care of ignoring (and not computing) scores of the nested clauses under that even if you use must inside those.


#5

Thanks for your help. I'll have to give that a try.


(system) #6