Rescoring documents based on Author occurrence


We have an index with documents that look like so:

    "id": "1",
    "author_id": "8",
    "popularity": 2.5,
    "tags": ["illustration", "book", "image"]

We search this index for documents matching given tags and sort them using a script_score (i.e. using popularity and other variables). In some cases though, almost all top ranking documents are from the same authorwhich is undesirable.

Therefore, we're looking for a way to decrease a document's score given it's author-occurrence-count, such that given the following results:

    {"id": "1", "author_id": 8, ...}, # author-occurrence-count: 1
    {"id": "5", "author_id": 8, ...}, # author-occurrence-count: 2
    {"id": "7": "author_id": 7, ...}, # author-occurrence-count: 1,
    {"id": "3", "author_id": 8, ...}, # author-occurrence-count: 3

We could re-score the result set to include the author-occurrence-count in the script_score and apply a monotonically decreasing function. The real unknown is how can we get the Elasticsearch query to return this type of variable such that it is available in a rescoring query -- and whether this is even possible!


This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.