Boost relevance score based on word proximity in Elasticsearch

I am working in a project where I need to provide boost based on Proximity in Elasticsearch. The Requirement states that let's say we have a field called statement in index so doc 1 has following value in statement

 "statement":["this is a dog",
              "it is brown in colour"
              "it is very fluffy"]

and doc 2 has statement as :-

 "statement":["this is a brown fluffy dog",
              "it plays in the garden"]

Let's say I do a query of "beautifull fluffy dog" then the result I am getting is both doc 1 and 2 with both having same relevance. But what I have to achieve is the doc 2 should come with higher relevance than doc 1 because in doc 2 the first statement is having fluffy and dog is same sentence, whereas in doc 1 its scattered over the values of statement. I am using phrase query with slop as 10 and have "position_increment_gap"as 100 in mapping. Using ES version 2.3.0

We don't have queries that both boost on proximity and allow some query terms to be missing. If document 1 had beautiful amoung its terms, should it rank better than doc 2 because it contains all query terms, or worse because document 2 has better proximity for 2 of the query terms?

You could try to use SHOULD clauses in order to boost the score of documents based on proximity of adjacent query terms:

GET index/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "statement": "beautiful fluffy dog"
          }
        }
      ],
      "should": [
        {
          "match_phrase": {
            "statement": {
              "query": "beautiful fluffy",
              "slop": 10
            }
          }
        },
        {
          "match_phrase": {
            "statement": {
              "query": "fluffy dog",
              "slop": 10
            }
          }
        }
      ]
    }
  }
}

Or alternatively use rescoring if it slows down your queries too much.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.