Scores not consistent between environments

dnd · February 2, 2016, 2:21pm

I have a very simple person name search that is returning different scores in different environments. Let's say my search in both environments is for Lori Jefferies. In my test environment, this scores the results as I am expecting and of the three users I have: Lori Jefferies, Lori Packer, and Lori Polca, jefferies is scored highest because of the two fields matching.

In my production environment, however, all users score exactly the same, even though Lori Jefferies matches both fields. This gist shows the results with explanation in both environments.

gist.github.com

https://gist.github.com/dnd/4c8fa89fe10aeb1511b5

production environment results.json

// There are many more users that have been snipped out of this, but they all score exactly the same matching the first name 'lori'
[
  {
    "attributes": {
      "first_name": "Lori",
      "last_name": "Packer",
      "phone": null,
      "created_at": "2015-01-14T21:44:47.000Z",
      "updated_at": "2015-01-14T21:55:08.000Z",
      "role_ids": [

This file has been truncated. show original

test environment results.json

[
  {
    "attributes": {
      "first_name": "Lori",
      "last_name": "Jefferies",
      "phone": "1-110-013-9586",
      "created_at": "2016-02-01T21:50:54.751Z",
      "updated_at": "2016-02-01T21:50:54.751Z",
      "role_ids": [],
      "id": "6",

This file has been truncated. show original

There are probably about 40 matching users total in production, but I don't understand why Lori Jefferies is not scored as the highest, and ranked as such since even in the production explanation it shows both the first_name and last_name match.

My query is a pretty simple {"multi_match":{"query":"lori jefferies","fields":["email","first_name","last_name","phone"]}}

Any help would be appreciated.

Thanks,
Steve

Ivan · February 2, 2016, 5:25pm

It appears that your two environments differ greatly on the number of
documents contained. Is your environment sharded? All your hits come from
the same shard, so I am assuming it is not. Production has over a million
documents, while test only six (in the shard containing the hits).

Because of the increased volume on production, the idf for the last_name
field is not scored as highly as on development. Since the default behavior
of multi_match is best_field, it will execute a dismax query underneath,
always favoring the first name.

For more consistent behavior, try to index more content on development to
further influence the IDF. Try playing around with the multi_match
settings, perhaps opting for cross_fields.

Cheers,

Ivan

Topic		Replies	Views
Results have a similar score, but the number of fields matched varies Elasticsearch	3	377	April 8, 2019
Puzzling Scoring situation using multi_match Elasticsearch	2	423	October 22, 2022
Elasticsearch/Lucene scoring broken? Elasticsearch	11	479	July 6, 2017
How is it calculated _score Elasticsearch	5	461	July 6, 2017
How to make doc which has more different words score higher? Elasticsearch	2	256	October 20, 2021

Scores not consistent between environments

Related topics