ElasticSearch query string query default_operator

The following query finds onlyl the documents that have all of the words "look", "for" and "this:

"query": {
            "query_string": {
                "query": "look for this",
                "default_operator":"AND"
          }
        }

While this one finds any documents that have any of those three words:

"query": {
            "query_string": {
                "query": "look for this",
                "default_operator":"OR"
          }
        }

My question is how to change the query so that it performs a Google style search, i.e. it first lists all the documents that have all of the terms then documents that have any of them?

I would appreciate any help.

Hi,

you shouldn't need to change the query to work the way you describe. If you test this with some simple documents like this:

PUT /index/type/1 
{
  "text" : "look for this and something more"
}

PUT /index/type/2
{
  "text" : "look for this"
}

PUT /index/type/3
{
  "text" : "this look is not so good"
}

PUT /index/type/4
{
  "text" : "one look is not enough"
}

And you use the AND operator like you described you get only the documents that contain all terms:

"hits": [
      {
        "_index": "index",
        "_type": "type",
        "_id": "2",
        "_score": 1.5686159,
        "_source": {
          "text": "look for this"
        }
      },
      {
        "_index": "index",
        "_type": "type",
        "_id": "1",
        "_score": 0.80226827,
        "_source": {
          "text": "look for this and something more"
        }
      }
    ]

The reason for document "2" ranking highter is that in standard scoring terms appearing in shorter fields carry more weight (see e.g. https://www.elastic.co/guide/en/elasticsearch/guide/current/relevance-intro.html) for some explanation.

Using the OR operator you get all the documents that contain any of the terms, but the documents containing more terms ranked higher:

"hits": [
      {
        "_index": "index",
        "_type": "type",
        "_id": "2",
        "_score": 1.5686159,
        "_source": {
          "text": "look for this"
        }
      },
      {
        "_index": "index",
        "_type": "type",
        "_id": "1",
        "_score": 0.80226827,
        "_source": {
          "text": "look for this and something more"
        }
      },
      {
        "_index": "index",
        "_type": "type",
        "_id": "3",
        "_score": 0.53484553,
        "_source": {
          "text": "this look is not so good"
        }
      },
      {
        "_index": "index",
        "_type": "type",
        "_id": "4",
        "_score": 0.16203022,
        "_source": {
          "text": "one look is not enough"
        }
      }
    ]

When terms appear across different fields, scoring gets more complicated but thats another kind of discussion. Generally speaking, using OR should rank the documentes higher that contain all (or many) of the search terms.

2 Likes

"OR" is not working for me. This is how my query looks like:

query = {
        size: 500,
        from: event.currentPage,
        "query": {
            "indices": {
                "indices":legit_indexes,
                "query": {
                  "query_string": {
                    "query": event.term,
                     "default_operator":"OR"
                  }
                },
                "no_match_query": "none"
              }
        },
        "aggs": {
          "types": {
            "terms": {
              "field": "datasource"
            }
          }
        },
        "sort": [{
          "@timestamp": {
            "order": "desc"
          }
        }]
    };

As you may guess legit_indexes is a list of indexes and event.term is anything that the user searches for in my website. This query is resulting in documents with any number of searched words. Can you see what the problem is with my query?

Hi,

The query looks okay at first glance, and you say that you get results when using OR. One reason why documents with less matching terms might show up higher in the result list could be that search relevance scores are specific to one index (they use the frequency terms, length of fields and doc counts for the scoring amongst other things but only consider the documents in one index). Combining these scores generally works well when the indices you are querying are somewhat similar in the data they contain, but if the data they contain is vastly different, the scores might not be so easily comparable. I can't tell exactly, but this might be the case for you. The OR operator doesn't seem to be the problem to me.

Thank you for the explanations. The data across all the indexes we have are very similar to each other.

I got rid of the sorting by @timestamp and now it seems to be working as expected. However, the reason I had the sorting in the first place was that I need the query to do something like this:

First: list the documents which have all the terms in order of the @timestamp field
Then: list the documents with less matching terms, again in order of the @timestamp field

Is this possible? How should I do it?

Hi,

that is a question that comes up sometimes, unfortunately the intent of scoring and sorting sometimes collide in the way you describe. What I saw people do in these cases are one of the following things:

  • issue multiple queries, sort each result set and re-combine on the client side. This is for example useful if you really want separate sections in your search results (e.g. first ten docs from category A, sorted by recency, then 10 docs from category B sorted by price, then something else from category C)
  • Use Function Score Query to somehow combine the search score with the recency (the date)
  • Use Rescoring of your Top N results to rescore only those results
1 Like