Help me modify this query so that it can find out the matches which contain N terms in a specific order as opposed to all matches possible which contain the N terms

This is what I had, this can query for phrases but they will always be in any order

{'query': {'bool': {'should': [{'bool': {'must': [{'match_phrase': {'data': 'word1'}}, {'match_phrase': {'data': 'word2'}}]}}]}}, 'from': 0, 'size': 9000}

The results returned by the above are:

bla bla word1 word2 bla
X Y word2 word1 bla bla
MNO bla bla word1 word2
ABC word1 word2 bla bla

The results I want for a given query of (word1,word2) are:

bla bla word1 word2 bla
ABC word1 word2 bla bla
MNO bla bla word1 word2

Similarly, for a query of (word2,word1), I want the results:

X Y word2 word1 bla bla

Can someone please tell me how to fix this query to include order of sequences

Hi,

You may want to look at Intervals Queries. One possible way to rewrite the query you provided is as follows:

GET my_index/_search
{
  "query": {
    "intervals": {
      "data": {
        "match": {
          "ordered": "true",
          "query": "word1 word2"
        }
      }
    }
  }
}

There are probably a few other ways to approach using word order in queries, so let me know if this doesn't solve your problem.

-William

1 Like

Hi William,

Thank you for your prompt response.

I'm using the python API. If I do a search of this form:

body = {
  "query": {
    "intervals": {
      "data": {
        "match": {
          "ordered": "true",
          "query": "word1 word2"
        }
      }
    }
  }
}
result = es.search(index=index, body=body)

I get an error:

RequestError(400, 'parsing_exception', 'no [query] registered for [intervals]')

Apparently, match_phrase is a solution but according to this article this is not the case:

body = {
    "query": {
        "multi_match" : {
            "query": "word1 word2",
            "fields": ["data"],
            "type": "phrase",
            "slop": 9999
        }
    }
}

Just tried this, can confirm, that it gives same results regardless of order of words considered.

Intervals query was introduced in Elasticsearch 7.0 I believe. Which version are you using?

If it's a version thing, I think I could be using an older one, any way I could check?

Running curl localhost:9200 should give you the version number.

If you have the URL of your elasticsearch instance, you can curl it or enter it in a browser to see version information. You should see something like this:

{
  "name" : "...",
  "cluster_name" : "...",
  "cluster_uuid" : "...",
  "version" : {
    "number" : "7.2.0",
    "build_flavor" : "default",
    "build_type" : "tar",
    "build_hash" : "508c38a",
    "build_date" : "2019-06-20T15:54:18.811730Z",
    "build_snapshot" : false,
    "lucene_version" : "8.0.0",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "You Know, for Search"
}

And I apologize for not mentioning that Intervals queries are a somewhat new feature.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.