Welcome to the Elastic community :)!
Curious question! I started working it out, and my main problem in the end is a conflict between the need 'category as primary sort condition' and the fact you need the categories to repeat a pattern rather than go A, A, A, B, B, B, B, C, C, C, C. That's not really sorting by category anymore, more like forcing a very specific order. You can't assign a numerical score such that an A further back in the queue is greater than some other A, given equal ratings for both - yet that is your requirement bc A, A is not acceptable given the presence of a B with the same rating. It's more like you want to assign a numerical score on the relationship between elements, which is context specific (i.e. a B following an A with the same rating has the highest score). That starts to involve traversing the list of hits for context.
Here's some thoughts on assigning different simple weights to the category and what that results in:
If A == 300, B == 200, C == 100 and the sort score = category + rating
, that makes category the primary sort criterion. Any C comes below any B, breaking the A, B, C pattern.
With category values of the same magnitude as rating, e.g. A == 3, B == 2 and C == 1 (sort score = category + rating
again), it's just a random mess since rating affects the A, B, C pattern directly now.
With A == 0.3, B == 0.2 and C == 0.1 (sort score = category + rating
as usual), we actually do achieve a stable A, B, C pattern within any one given rating. Neat, but it makes rating the primary criterion. An A9 will come before a B8, which is explicitly not what you want.
As I mentioned above, you're at the point where you'd need to traverse the list of hits and just reorder the docs. I'd sort by rating first, category second. Then for all 10s, pick the next A, then the next B, then the next C.
I personally wouldn't write this in Painless though, I'd just fetch the docs and reorder in the application in this limited theoretical case. Dependent on what other factors you want to include in the scoring perhaps within Elasticsearch is still the best place. Also the ease of allocating resources matters - you can start more ES nodes easily, but if your application can't handle the compute or memory reqs for the scale of your data and it's harder to add compute capacity to it than ES, then I would do the forced reorder in ES.
This isn't a question of compound queries. You can't just fetch all As because you need A-B-C context. If you fetch only all Rating = 10, there's still no way I can see to produce the A-B-C pattern rather than an A-A-B-B-C-C pattern, except "manual" reordering.
EDIT:
Document setup for the scenario given in OP. Scoring for my third option.
In Kibana Dev Console:
PUT index1
{
"mappings": {
"properties": {
"category": {"type": "keyword"},
"rating": {"type": "integer"}
}
}
}
POST index1/_doc
{"category": "A", "rating": 10}
POST index1/_doc
{"category": "A", "rating": 9}
POST index1/_doc
{"category": "A", "rating": 5}
POST index1/_doc
{"category": "B", "rating": 8}
POST index1/_doc
{"category": "B", "rating": 5}
POST index1/_doc
{"category": "B", "rating": 4}
POST index1/_doc
{"category": "B", "rating": 1}
POST index1/_doc
{"category": "C", "rating": 9}
POST index1/_doc
{"category": "C", "rating": 8}
POST index1/_doc
{"category": "C", "rating": 3}
POST index1/_doc
{"category": "C", "rating": 2}
GET index1/_search
{
"query": {
"match_all": {}
},
"sort": {
"_script": {
"type": "number",
"script": {
"lang": "painless",
"source": "double cat_num; if (doc['category'].value == 'A') { cat_num = 0.3; } else if (doc['category'].value == 'B') { cat_num = 0.2; } else if (doc['category'].value == 'C') { cat_num = 0.1;} else { cat_num = 0;} return doc.rating.value + cat_num;"
},
"order": "desc"
}
}
}