Explanation of terms in a search

Hello. I am trying to understand this passage, which is in the 6.8 REST API docs:

By default, all terms are optional, as long as one term matches. A search for foo bar baz will find any document that contains one or more of foo or bar or baz . We have already discussed the default_operator above which allows you to force all terms to be required

Since the default_operator is OR, doesn't that mean that all the terms are required and none are actually optional? If I specify "foo bar baz" doesn't that actually become "foo OR bar OR baz" under the covers? Maybe it's just the wording that's confusing.

Thanks in advance to all who respond.

No, as in asking to fetch someone “dead OR alive”, it means either of these options is good.

Thank you, I got a similarly clear explanation from reading the docs on simple_string_query.

And/or terminology is confusing to most.
I have a personal interest in finding ways to express query logic visually - as a graphical flow that is more easily understood.
Text syntax can be confusing - people might want to find “cats AND dogs” but there is no animal that is simultaneously a cat and a dog so nothing matches. That is why AND can be counter-intuitive.
I’m keen to understand if you had a similar expectation with the use of OR? Is there a particular query you had in mind (like my cats and dogs example) where a typical English speaker would expect one thing but the search engine does something else?

I don't have a particular query in mind. I'm new to ES and just trying to learn as much as I can in support of my work for my current project. The code we inherited tried to do boolean queries but I don't think they work properly. So far, the customer hasn't requested more sophisticated queries involving boolean contstructs but I'm just trying to educate myself as much as possible in the event that they do. To reference my original post, I'm hung up/confused on the wording and the way I am parsing it logically; maybe my logic circuits are in need of repair. It says "all terms are optional, as long as one term matches". Then, what are they - required? I'm okay with the 2nd sentence; that is what I would expect. But then the 3rd sentence throws me: I think it means that by using the default_operator set to AND, THEN all the terms are required. The default operator of OR relieves me from writing foo OR bar OR baz; similarly, I can (a) set the default operator to AND and have this resolve to foo AND bar AND baz or (b) explicitly state the boolean condition in the query: foo AND bar AND baz.

Hopefully that makes more sense? :thinking:

Lucene’s query parser (used to parse “query_string” parts of json) is perhaps a little different to regular Boolean logic.
TheTL/DR is you are advised to put brackets around almost everything for ANDs/ORs to make any sense.

To understand the reason why it helps to understand the background story. Lucene is unlike databases that use a binary matching logic where database records either match a query or don’t. Instead Lucene is designed to match document searches to varying degrees. It can take a bag of search terms and relevance rank documents based on

  1. the number of search terms that matched
  2. how rare the matching terms are overall
  3. how frequently the terms are repeated in the document.
    The more of these boxes that get ticked by a document the higher its relevance score.

This matching mode is suited to matching free text with various options. However, sometimes users want to add mandatory clauses (products MUST be in my price range or MUST_NOT contain meat). As soon as a Boolean query has any of these mandatory clauses added, the optional parts are relegated to being 100% optional - (but still “preferable”).
If you don’t have any mandatory clauses in a Boolean query then at least one of the listed optional clauses has to match (otherwise you’d match documents that were entirely irrelevant.

Let’s build up with some examples

 elasticsearch  

only matches Elasticsearch

elasticsearch OR elastic

matches documents that contain either of these words and preferably both

 elasticsearch OR elastic AND search

is weird. Because we introduced a mandatory clause using AND the effect can be surprising. Lucene’s parser sees this as 2 MUST clauses (elastic and search) and one entirely optional extra-points-if-you-have-it clause (Elasticsearch). In The more verbose JSON syntax the parsed bool query is effectively

Bool
    Must
        Elastic
        Search
    Should
        Elasticsearch 

So a document containing only “Elasticsearch” would not match.
The solution is to wrap the parts in brackets to make multiple Boolean expressions eg

  elasticsearch OR (elastic AND search)

This is parsed into:

 Bool
     Should
           Elasticsearch
           Bool
                Must
                    Elastic
                    Search

Note the root bool only has 2 optional clauses and no mandatory ones which means the logic is it has to match at least one.

Horribly complex I know, but the moral of the story is “use brackets” when mixing ANDs with ORs. This is good practice with other databases anyhow.

Mark - Thank you for taking the time to write this response and provide good examples. You just read my mind. While reviewing the documentation for the query_string query, I was puzzled by the first example: this AND that OR thus and I was going to post another question about this. But you fully answered the question, which for me is an important one, as our customers may want to have more complicated Boolean logic enabled for their searching. Much appreciated!

1 Like

I just tried your explanation against my data and got the results exactly as you described. Thank you again for taking the time to explain it so well.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.