Fucntion Score with Random Score is ignoring a simple sort

Hi,

I have a search within my index that currently works, I want to get the professionals that have a subscription first (is_subscribed BOOLEAN) and then follow with the rest of users. I achieve this using the query:

POST /my_index/professionals/_search
{
  "from":0,
  "size":40,
  "query":{
     "constant_score":{
        "filter":{
           "bool":{
              "should":[
                 {
                    "term":{
                       "primary_locality_id":4
                    }
                 },
                 {
                    "term":{
                       "secondary_locality_id":4
                    }
                 }
              ]
           }
        }
     }
  },
  "sort":[
     {
        "is_subscribed":{
           "order":"desc"
        }
     }
  ]
}

This is giving the expected output, but to be fair to our users we would like to randomly seed this results while keeping the prioritised items (in this case, is_subscribed: true) on the top and follow with the rest. Googling, I ran into the function_score using a random_score and a seed in order to randomise the items. I wrote this query:

{
   "from":0,
   "size":57,
   "query":{
      "function_score":{
         "query":{
            "constant_score":{
               "filter":{
                  "bool":{
                     "should":[
                        {
                           "term":{
                              "primary_locality_id":4
                           }
                        },
                        {
                           "term":{
                              "secondary_locality_id":4
                           }
                        }
                     ]
                  }
               }
            }
         },
         "random_score":{
            "seed":1484879348
         }
      }
   },
   "sort":[
      {
         "is_subscribed":{
            "order":"desc"
         }
      }
   ]
}

That will return always the same results, no matter what the seed is. I know this is happening because of my this part:

"sort":[
  {
     "is_subscribed": {
        "order":"desc"
     }
  }
]

This is affecting the function_score, as if I remove then the results are actually randomised and they respect the different random_score.seed value. But that doesn't fix my issue, as I still need to prioritise those subscribed users.

I've read through the documentation and Googled and couldn't find a way of combining both in a way the sort has better priority. What I am missing out? What's the way of first sorting and then randomising?

Thanks in advance,

Sort orders essentially throw away the score that is generated by the fulltext query, and sort only by the provided sort values by default.

If you want the value of is_subscribed to merged into the score there are two options:

  • Add "_score"as a sort order and enable score tracking which is turned off by default when sorting by a field. This will treat the _score as a tie-breaker in the sort ordering. For example, if you sort by is_subscribed first, _score will only be used to sort documents that share the same is_subscribed. So is_subscribed: true come first, then all the subscribed values get sorted by _score, then all the is_subscribed: false come next, sorted by score, etc

  • Remove the sort parameter and add some kind of query which looks at is_subscribed and merges that into the overall score in some fashion (part of a boolean must clause, boosting, etc). This is the more flexible option, but also doesn't guarantee complete ordering of one field first like the sort param does. E.g. there might be a is_subscribed: false which ranks higher than true if it has a particularly rare token that scores highly or something.

Hi @polyfractal

Thank you for your detailed answer and both options. From what I read, seems that only the first one is the option I should follow as the second might give some false positives.

I've gone ahead and re-wrote my query to include the score tracking and also sort by "_score" but seems that isn't really giving me the output you were saying.

This is my rewritten query:

{
   "track_scores": true,
   "from":0,
   "size":57,
   "query":{
      "function_score":{
         "query":{
            "constant_score":{
               "filter":{
                  "bool":{
                     "should":[
                        {
                           "term":{
                              "primary_locality_id":4
                           }
                        },
                        {
                           "term":{
                              "secondary_locality_id":4
                           }
                        }
                     ]
                  }
               }
            }
         },
         "random_score":{
            "seed": 1
         }
      }
   },
   "sort":[
      {
        "is_subscribed":{
            "order":"desc"
        },
        "_score" : {
             "order":"desc"
        }
      }
   ]
}

As you see, I enable the score tracking and sort by is_subscribed first and then by _score, but my results aren't showing the is_subscribed: true first:

Here is the detailed output with some redacted information:

{
  "hits": {
    "hits": [
      {
        "sort": [
          0.99855137,
          1
        ],
        "_type": "professionals",
        "_source": {
          "first_name": "First Name",
          "is_subscribed": true,
          "primary_locality_id": 4,
        },
        "_score": 0.99855137,
        "_index": "my_index",
        "_id": "2051766"
      },
      {
        "sort": [
          0.9551844,
          1
        ],
        "_type": "professionals",
        "_source": {
          "first_name": "First Name",
          "is_subscribed": true,
          "primary_locality_id": 4,
        },
        "_score": 0.9551844,
        "_index": "my_index",
        "_id": "2051648"
      },
      {
        "sort": [
          0.95447606,
          1
        ],
        "_type": "professionals",
        "_source": {
          "first_name": "First Name",
          "is_subscribed": true,
          "primary_locality_id": 4,
        },
        "_score": 0.95447606,
        "_index": "my_index",
        "_id": "2051322"
      },
      {
        "sort": [
          0.93707305,
          0
        ],
        "_type": "professionals",
        "_source": {
          "first_name": "First Name",
          "is_subscribed": false,
          "primary_locality_id": 4,
        },
        "_score": 0.93707305,
        "_index": "my_index",
        "_id": "2051063"
      },

        // 8 more results with is_subscribed: false

      {
        "sort": [
          0.8206227,
          1
        ],
        "_type": "professionals",
        "_source": {
          "first_name": "First Name",
          "is_subscribed": true,
          "primary_locality_id": 4,
        },
        "_score": 0.8206227,
        "_index": "my_index",
        "_id": "2050420"
      },
      
        // 4 more results with is_subscribed: false,

      {
        "sort": [
          0.7055144,
          1
        ],
        "_type": "professionals",
        "_source": {
          "first_name": "First Name",
          "is_subscribed": true,
          "primary_locality_id": 4,
        },
        "_score": 0.7055144,
        "_index": "my_index",
        "_id": "2050676"
      },
      {
        "sort": [
          0.6966923,
          1
        ],
        "_type": "professionals",
        "_source": {
          "first_name": "First Name",
          "is_subscribed": true,
          "primary_locality_id": 4,
        },
        "_score": 0.6966923,
        "_index": "my_index",
        "_id": "2051403"
      },
      {
        "sort": [
          0.6890349,
          0
        ],
        "_type": "professionals",
        "_source": {
          "first_name": "First Name",
          "is_subscribed": false,
          "primary_locality_id": 4,
        },
        "_score": 0.6890349,
        "_index": "my_index",
        "_id": "2050821"
      },
      
      // more mixed results

    ],
    "total": {
      "relation": "eq",
      "value": 57
    },
    "max_score": 0.99855137
  },
  "_shards": {
    "successful": 1,
    "failed": 2,
    "skipped": 0,
    "total": 3,
    "failures": [
      {
        "node": "xxxx",
        "index": ".kibana_1",
        "reason": {
          "index_uuid": "UQFK_z99SyeqezAmXfF_Nw",
          "index": ".kibana_1",
          "reason": "No mapping found for [is_subscribed] in order to sort on",
          "type": "query_shard_exception"
        },
        "shard": 0
      },
      {
        "node": "yyyy",
        "index": ".kibana_task_manager",
        "reason": {
          "index_uuid": "8U3zm8bKRROdd92wY_Scdw",
          "index": ".kibana_task_manager",
          "reason": "No mapping found for [is_subscribed] in order to sort on",
          "type": "query_shard_exception"
        },
        "shard": 0
      }
    ]
  },
  "took": 11,
  "timed_out": false
}

I've read your answer multiple times, but I can't figure out what I am doing wrong. I'm using the _score correctly to sort the results?

Thanks in advance,

Hi @polyfractal

I haven't been able to resolve the issue mentioned above, any idea idea why _score is not being sorted correctly?

Thanks in advance!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.