Results have a similar score, but the number of fields matched varies

To simplify:

PUT /test/vendors/1
{
  "type": "doctor",
  "name": "Ron",
  "place": "Boston"  
}

PUT /test/vendors/2
{
  "type": "doctor",
  "name": "Tom",
  "place": "Boston"  

}

PUT /test/vendors/3
{
  "type": "doctor",
  "name": "Jack",
  "place": "San Fran"  

}

then I search...

GET /test/_search
{
  "query": {
    "multi_match" : {
      "query":    "doctor in Boston", 
      "fields": [ "type", "place" ] 
    }
  }
}

To simplify:

PUT /test/vendors/1
{
  "type": "doctor",
  "name": "Ron",
  "place": "Boston"  
}

PUT /test/vendors/2
{
  "type": "doctor",
  "name": "Tom",
  "place": "Boston"  

}

PUT /test/vendors/3
{
  "type": "doctor",
  "name": "Jack",
  "place": "San Fran"  

}

Then search:

GET /test/_search
{
  "query": {
    "multi_match" : {
      "query":    "doctor in Boston", 
      "fields": [ "type", "place" ] 
    }
  }
}

I understand why I get Jack who works in San Fran -- it's because he's a doctor too. However, I can't figure out why the match score is the SAME for him. The other two were matched with the place too, weren't they? why aren't Ron and Tom scored higher?

{
  "took": 11,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 0.9245277,
    "hits": [
      {
        "_index": "test",
        "_type": "vendors",
        "_id": "2",
        "_score": 0.9245277,
        "_source": {
          "type": "doctor",
          "name": "Tom",
          "place": "Boston"
        }
      },
      {
        "_index": "test",
        "_type": "vendors",
        "_id": "1",
        "_score": 0.9245277,
        "_source": {
          "type": "doctor",
          "name": "Ron",
          "place": "Boston"
        }
      },
      {
        "_index": "test",
        "_type": "vendors",
        "_id": "3",
        "_score": 0.9245277,
        "_source": {
          "type": "doctor",
          "name": "Jack",
          "place": "San Fran"
        }
      }
    ]
  }
}

Is there a way to force it to score less when less search keywords are found? Also, If I'n going to wrong way about this kind of search and there's a better pattern/way to do it -- I'd appreciate to be pointed in the right direction.

There are a few things going on here. One of those things is that you're going with the default of 5 shards. With a small dataset like this (3 documents), this will distort the scoring. Scoring is done at the shard-level, and having 3 documents distributed over 5 shards will cause scoring to be inconsistent from one shard to another. I would recommend creating the index with only one shard before indexing the documents:

PUT test
{
  "settings": {
    "number_of_shards": 1
  }
}

This will already partially solve the issue, as the two doctors from Boston will now rank highest.

There is another thing going on though. The default score calculation mechanism of a multi_match query is of the so called "best_fields" type. That means Elasticsearch will calculate a score for each of the individual fields "type" and "place", and whichever score is the highest will be the overall score of a document.

You can see how the score is calculated by adding "explain": true to a search request:

GET /test/_search
{
  "explain": true, 
  "query": {
    "multi_match" : {
      "query":    "doctor in Boston", 
      "fields": [ "type", "place" ] 
    }
  }
}

Now, you'll be able to see the detailed breakdown of each document's score.

You can change the way the score is calculated from "best_fields" into "most_fields". Now, the more fields match your query, the higher the score is going to be (Elasticsearch will sum all of the individual scores of each of the fields).

GET /test/_search
{
  "query": {
    "multi_match" : {
      "query":    "doctor in Boston", 
      "fields": [ "type", "place" ],
      "type": "most_fields"
    }
  }
}

This is probably what you will want to do.

1 Like

"best_fields" did the job. I appreciate you taking the time to explain about this option and the shards. Thanks!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.