Function_score with order

qballer · January 13, 2019, 6:02pm

I have documents which I calculate a function_score with the search query below. The scoring seems to work well yet some documents get the same score. This causes results to repeat and to some never comer back. In essence I am sorting by scoring. I wonder if I can apply some secondary sorting to the results so order (a.k.a scoring) will be unique. For example each document has a unique string and I would like to order groups of documents with the same score by it.

{
  "from": "some_number",
  "size": "some_number",
  "query": {
    "function_score": {
      "query": {
        "bool": {
          "should": [
            "some terms ..."
          ],
          "minimum_should_match": 1,
          "must": [
            "some must filters"
          ]
        }
      },
      "functions": [
        {
          "gauss": {
            "createdAt": {
              "scale": "30d",
              "decay": 0.8
            }
          },
          "weight": 0.3
        },
        {
          "script_score": {
            "script": {
              "source": "return _score + some calulation"
            }
          },
          "weight": 0.6
        }
      ],
      "boost_mode": "sum"
    }
  }
}

I've tried several things like applying a sort and field_value_factor, but that doesn't seem to work. I will appreciate any other tips.

abdon · January 14, 2019, 3:26am

When you say applying a sort did not work, can you explain why not? It is a common pattern to sort on a document's _id as the secondary sort order, as a tie breaker, if you wish to get a consistent ordering.

Something like this should work:

GET _search
{
  "query": {
    "function_score": {
      "query": {
        YOUR QUERY
      },
      "functions": [
        YOUR FUNCTIONS
      ]
    }
  },
  "sort": [
    {
      "_score": {
        "order": "desc"
      }
    },
    {
      "_id": {
        "order": "desc"
      }
    }
  ]
}

qballer · January 14, 2019, 9:54am

I've tried this type of sorting before but reading your recommendation i've tried again. It seems that this function is the culprit, when I comment out this scoring method there are no duplicates in the results. The values are all in the same time (or around) so I understand why there are duplicates but I don't understand why the secondary sort doesn't take of that.

abdon · January 14, 2019, 4:32pm

I'm guessing that what's happening is this: all documents have a slightly different value for createdAt. As a result they get a different score. Even if two documents differ by just a few miliseconds, the score is going to be slightly different and as a result, the secondary sort order is not going to play a factor.

Would it be possible to reindex your documents with less precise values for createdAt? For example, if you round down createdAt to the closed hour, then all documents that were created at about the same time will get the same score. Then you will be able to get to your desired deterministic sort order by using the _id as the secondary sort criterion.

qballer · January 14, 2019, 4:34pm

Perhaps I don't understand something doesn't scale do that somehow?

abdon · January 14, 2019, 5:15pm

No, scale determines how fast the score goes to zero the further you get from the current datetime. See this diagram in our documentation:

qballer · January 14, 2019, 8:49pm

Rounding down to an hour can work but will eventually change order. Why not compare to epoch time some how (asc ordering?). Seems to make more sense stability wise.

abdon · January 16, 2019, 9:14pm

Maybe I'm misunderstanding your issue. Internally, Elasticsearch stores dates as epoch milliseconds. My assumption was that what you are seeing is caused by documents not having the exact same value for createdAt. As a result, those documents all get a different score.

system · February 13, 2019, 9:14pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Fucntion Score with Random Score is ignoring a simple sort Elasticsearch	4	3282	July 1, 2019
Function_score script_score query vs sort script performance Elasticsearch	1	2881	July 5, 2017
Custom Score Query and Sort questions Elasticsearch	2	2021	July 6, 2017
Use _score as tiebreaker only for function_score query Elasticsearch	1	825	April 4, 2018
Composing function scores Elasticsearch	1	394	July 6, 2017

Function_score with order

Related topics