Inaccurate Elasticsearch scores with custom scoring functions

My Elasticsearch scores become inaccurate when using big number differences.

I'm using Elasticsearch 7.10.1.

If I have an index that looks something like this:

[{:index=>
   {:_index=>"candidates",
    :_id=>"a1786607-e095-4621-bdf9-de2706475614",
    :data=>
     {:name=>"Carli Stark",
      :is_verified=>true, :has_work_permit=>true}}},
 {:index=>
   {:_index=>"candidates",
    :_id=>"57f78d3f-392e-4cdf-a5ff-6d10e7c89d5b",
    :data=>
     {:name=>"Gayla Keeling",
      :is_verified=>false, :has_work_permit=>true}}}]

And then I perform a query like this:

GET candidates/_search
{
  "query": {
    "function_score": {
      "query": {
        "match_all": {}
      },
      "functions": [
        {
          "filter": {
            "term": {
              "is_verified": true
            }
          },
          "weight": 1000
        },
        {
          "filter": {
            "term": {
              "has_work_permit": true
            }
          },
          "weight": 100000000000
        }
      ],
      "score_mode": "sum",
      "boost_mode": "replace"
    }
  },
  "_source": ["is_verified", "has_work_permit"],
  "size": 50
}

For some reason, I get exactly the same scores for both documents, which is unexpected:

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 9.9999998E10,
    "hits" : [
      {
        "_index" : "candidates_production_20240828114811152",
        "_type" : "_doc",
        "_id" : "cbd1b70b-f889-4136-a43e-f6782955f58e",
        "_score" : 9.9999998E10,
        "_source" : {
          "is_verified" : false,
          "has_work_permit": true
        }
      },
      {
        "_index" : "candidates_production_20240828114811152",
        "_type" : "_doc",
        "_id" : "d644a5e5-09e0-496e-8830-c1a772c46611",
        "_score" : 9.9999998E10,
        "_source" : {
          "is_verified" : true
          "has_work_permit": true
        }
      }
    ]
  }
}

But if I use different weights that are closer, for example 10000 instead of 1000, then the scores are different and results come as expected:

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 1.00000006E11,
    "hits" : [
      {
        "_index" : "candidates_production_20240828114811152",
        "_type" : "_doc",
        "_id" : "d644a5e5-09e0-496e-8830-c1a772c46611",
        "_score" : 1.00000006E11,
        "_source" : {
          "is_verified" : true
          "has_work_permit": true
        }
      },
      {
        "_index" : "candidates_production_20240828114811152",
        "_type" : "_doc",
        "_id" : "cbd1b70b-f889-4136-a43e-f6782955f58e",
        "_score" : 9.9999998E10,
        "_source" : {
          "is_verified" : false
          "has_work_permit": true
        }
      }
    ]
  }
}

Any ideas of why does this happen? And is there a way to fix it?

From Elastic Search to Elasticsearch

Why would you like to boost by so huge values?

I guess it's related to the internals. According to this doc, the max is 3.402823e+38:

The new score can be restricted to not exceed a certain limit by setting the max_boost parameter. The default for max_boost is FLT_MAX.

May be you are hitting this?

I'm using Elasticsearch 7.10.1.

Please upgrade to at least 7.17. Your version is unsafe.
Better to switch to 8.15.0

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.