Elastic Function_Score not applying properly

Working on elastic search 6.5.3. I am using function_score to change the weights of the particular results. I am expecting the output as the results with the highest weight comes first and then remaining follows. Below is my Query

GET www-test-index/_search
{
  "query": {
    "function_score": {
      "query": {
        "multi_match": {
          "operator": "and",
          "query": "foo",
          "fields": [
            "title^10",
            "content^5"
          ]
        }
      },
      "functions": [
        {
          "filter": {
            "match": {
              "url": "https://www-somesite/sports/first"
            }
          },
           
          "weight": 50
        },
        {
          "filter": {
            "match": {
              "url": "https://www-somesite/sports/second"
            }
          },
        
          "weight": 49
        }
      ],
      "max_boost": 42,
      "score_mode": "sum",
      "boost_mode": "multiply",
      "min_score": 42
    }
  }
}

When I run the above query the result with the https://www-somesite/sports/second is appearing as first place followed by https://www-somesite/sports/first. Little confusion how I can get https://www-somesite/sports/first in the first place.

There's a few things that could be going on:

  • The max_boost of 42 is lower than the actual weights of the functions. As a result, both URLs will get the same boost.
  • Is url a text field or a keyword field? If it is a text field, both URLs will actually match any of your function queries. If so, you may want to change url into url.keyword in your function queries.
  • How many shards does your index have? With more than one shard, while testing on small datasets, you may get unexpected scoring results. Try creating the www-test-index index with one shard.
  • Can I increase the Boost value to max number?

  • Url is text field in my case

  • Well I am using 4 shards and my index managing 10M docs.

Here is my config.

{
   "settings":{
      "index":{
         "number_of_shards":4,
         "number_of_replicas":0,
         "refresh_interval":"500ms",
         "analysis":{
            "analyzer":{
               "my_analyzer":{
                  "tokenizer":"whitespace",
                  "filter":[
                     "lowercase",
                     "my_snow",
                     "my_stemmer",
                     "my_synonym",
                     "my_stop"
                  ]
               }
            },
            "filter":{
               "my_stop":{
                  "type":"stop",
                  "stopwords_path":"stopwords/stopwords_en.txt"
               },
               "my_snow":{
                  "type":"snowball",
                  "language":"English"
               },
               "my_synonym":{
                  "type":"synonym_graph",
                  "expand":false,
                  "synonyms_path":"analysis/synonym.txt"
               },
               "my_stemmer":{
                  "type":"stemmer",
                  "name":"english"
               }
            }
         }
      }
   },
   "mappings":{
      "doc":{
         "_source":{
            "enabled":true
         },
         "properties":{
            "title":{
               "type":"text",
               "index":"true",
               "store":true,
               "analyzer":"my_analyzer"
            },
            "content":{
               "type":"text",
               "index":"true",
               "store":true,
               "analyzer":"my_analyzer"
            },
            "url":{
               "type":"text",
               "index":"true",
               "store":true
            },
            "host":{
               "type":"keyword",
               "index":"true",
               "store":true
            }            
         }
      }
   }
}
  • Yeah, in your case, if you want to keep the max_boost, you'd need to set it to at least the highest weight (50).
  • Does it make sense, for the Url to be a text field? Typically, fields like Url are only used for exact searches, not for full-text queries. Often, it makes most sense to map a field like Url as type keyword instead.
    As a work-around you could change the match queries in your functions into match_phrase queries.
  • If you're working with millions of documents then you can forget what I said about shards. I wanted to make sure you were not testing against 2 documents in an index with the default of 5 shards.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.