Dynamic score calculation

I have documents like this ( simplified case of course):

Category: A, Rating: 10
Category: A, Rating: 9
Category: A, Rating: 5

Category: B, Rating: 8
Category: B, Rating: 5
Category: B, Rating: 4
Category: B, Rating: 1

Category: C, Rating: 9
Category: C, Rating: 8
Category: C, Rating: 3
Category: C, Rating: 2

We can assume that category can be also numeric, it does not matter for me.

What I need to achieve it to sort result by:

  • rotated types -> ABC ABC ABC and when some type is "finished" ( lets assume A) then for example BC BC BC and then when B is finished just some more C C C

  • secondary condition is rating and we want to sort from highest to lowest

So example of expected output would be:

Category: A, Rating: 10
Category: B, Rating: 8
Category: C, Rating: 9

Category: A, Rating: 9
Category: B, Rating: 5
Category: C, Rating: 8

Category: A, Rating: 5
Category: B, Rating: 4
Category: C, Rating: 3

Category: B, Rating: 1
Category: C, Rating: 2

Does anyone have any hint how to achieve it?

I try playing with function_score and multiple script_score option but not luck.

My idea at some point was to give each category a number and than somehow increment it ( but how? ) so for example A = 1, B = 2, C = 3 and documents with A get scores like 11, 210, 31, 41 and so on, B -> 12, 22, 32, 42 , C -> 13, 23, 33, 43 etc and but I cannot get to satisfying solution.

Maybe someone will have some other idea? Any ideas are appreciated.

Btw, real case there will be more sorting/scoring criteria than just one rating. I simplified it right now for this case.

Welcome to the Elastic community :)!

Curious question! I started working it out, and my main problem in the end is a conflict between the need 'category as primary sort condition' and the fact you need the categories to repeat a pattern rather than go A, A, A, B, B, B, B, C, C, C, C. That's not really sorting by category anymore, more like forcing a very specific order. You can't assign a numerical score such that an A further back in the queue is greater than some other A, given equal ratings for both - yet that is your requirement bc A, A is not acceptable given the presence of a B with the same rating. It's more like you want to assign a numerical score on the relationship between elements, which is context specific (i.e. a B following an A with the same rating has the highest score). That starts to involve traversing the list of hits for context.

Here's some thoughts on assigning different simple weights to the category and what that results in:

If A == 300, B == 200, C == 100 and the sort score = category + rating, that makes category the primary sort criterion. Any C comes below any B, breaking the A, B, C pattern.

With category values of the same magnitude as rating, e.g. A == 3, B == 2 and C == 1 (sort score = category + rating again), it's just a random mess since rating affects the A, B, C pattern directly now.

With A == 0.3, B == 0.2 and C == 0.1 (sort score = category + rating as usual), we actually do achieve a stable A, B, C pattern within any one given rating. Neat, but it makes rating the primary criterion. An A9 will come before a B8, which is explicitly not what you want.

As I mentioned above, you're at the point where you'd need to traverse the list of hits and just reorder the docs. I'd sort by rating first, category second. Then for all 10s, pick the next A, then the next B, then the next C.

I personally wouldn't write this in Painless though, I'd just fetch the docs and reorder in the application in this limited theoretical case. Dependent on what other factors you want to include in the scoring perhaps within Elasticsearch is still the best place. Also the ease of allocating resources matters - you can start more ES nodes easily, but if your application can't handle the compute or memory reqs for the scale of your data and it's harder to add compute capacity to it than ES, then I would do the forced reorder in ES.

This isn't a question of compound queries. You can't just fetch all As because you need A-B-C context. If you fetch only all Rating = 10, there's still no way I can see to produce the A-B-C pattern rather than an A-A-B-B-C-C pattern, except "manual" reordering.

EDIT:

Document setup for the scenario given in OP. Scoring for my third option.

In Kibana Dev Console:

PUT index1
{
  "mappings": {
    "properties": {
      "category": {"type": "keyword"},
      "rating": {"type": "integer"} 
    }
  }
}

POST index1/_doc
{"category": "A", "rating": 10}
POST index1/_doc
{"category": "A", "rating": 9}
POST index1/_doc
{"category": "A", "rating": 5}
POST index1/_doc
{"category": "B", "rating": 8}
POST index1/_doc
{"category": "B", "rating": 5}
POST index1/_doc
{"category": "B", "rating": 4}
POST index1/_doc
{"category": "B", "rating": 1}
POST index1/_doc
{"category": "C", "rating": 9}
POST index1/_doc
{"category": "C", "rating": 8}
POST index1/_doc
{"category": "C", "rating": 3}
POST index1/_doc
{"category": "C", "rating": 2}

GET index1/_search
{
  "query": {
    "match_all": {}
  },
  "sort": {
    "_script": {
      "type": "number",
      "script": {
        "lang": "painless",
        "source": "double cat_num; if (doc['category'].value == 'A') { cat_num = 0.3; } else if (doc['category'].value == 'B') { cat_num = 0.2; } else if (doc['category'].value == 'C') { cat_num = 0.1;} else { cat_num = 0;} return doc.rating.value + cat_num;"
      },
      "order": "desc"
    }
  }
}

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.