Hey!
We have an index with documents that look like so:
{
"id": "1",
"author_id": "8",
"popularity": 2.5,
"tags": ["illustration", "book", "image"]
}
We search this index for documents matching given tags
and sort them using a script_score
(i.e. using popularity
and other variables). In some cases though, almost all top ranking documents are from the same author
which is undesirable.
Therefore, we're looking for a way to decrease a document's score given it's author-occurrence-count, such that given the following results:
[
{"id": "1", "author_id": 8, ...}, # author-occurrence-count: 1
{"id": "5", "author_id": 8, ...}, # author-occurrence-count: 2
{"id": "7": "author_id": 7, ...}, # author-occurrence-count: 1,
{"id": "3", "author_id": 8, ...}, # author-occurrence-count: 3
...
]
We could re-score the result set to include the author-occurrence-count
in the script_score
and apply a monotonically decreasing function. The real unknown is how can we get the Elasticsearch query to return this type of variable such that it is available in a rescoring query -- and whether this is even possible!
Thanks!
Charles