I'm trying to find a way to prevent multiple posts from appearing in search
results that are from the same author. So far I've tried random scoring,
which allows me to maintain pagination. However, I can still have up to 4
of the same authors in a given page of 10 results.
Is there any way to score a document based on how many times a certain
field occurs in the result set? As far as I'm aware you cannot persist a
variable or object in a scoring script.
I've looked into several methods of accomplishing this, but many of them
have quite a few cons. Such as removing the duplicates, and calling again
to retrieve a new set of results which have the current authors excluded.
However this can also return multiple of the same authors. So I'm left to
query one by one to replace duplicate authors in a result set, and this
breaks deep pagination because eventually the other result set which is
used to replace duplicates runs out of pages before the standard search.
I've also tried aggregation which is not page-able.
Is there any functionality to spread out or subtract the score of a
document based on how many times a document of the same author(or field)
occurs?
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/89f4676e-3472-4abf-a182-229299d2149f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.