I have documents which I calculate a function_score with the search query below. The scoring seems to work well yet some documents get the same score. This causes results to repeat and to some never comer back. In essence I am sorting by scoring. I wonder if I can apply some secondary sorting to the results so order (a.k.a scoring) will be unique. For example each document has a unique string and I would like to order groups of documents with the same score by it.
When you say applying a sort did not work, can you explain why not? It is a common pattern to sort on a document's _id as the secondary sort order, as a tie breaker, if you wish to get a consistent ordering.
I've tried this type of sorting before but reading your recommendation i've tried again. It seems that this function is the culprit, when I comment out this scoring method there are no duplicates in the results. The values are all in the same time (or around) so I understand why there are duplicates but I don't understand why the secondary sort doesn't take of that.
I'm guessing that what's happening is this: all documents have a slightly different value for createdAt. As a result they get a different score. Even if two documents differ by just a few miliseconds, the score is going to be slightly different and as a result, the secondary sort order is not going to play a factor.
Would it be possible to reindex your documents with less precise values for createdAt? For example, if you round down createdAt to the closed hour, then all documents that were created at about the same time will get the same score. Then you will be able to get to your desired deterministic sort order by using the _id as the secondary sort criterion.
Rounding down to an hour can work but will eventually change order. Why not compare to epoch time some how (asc ordering?). Seems to make more sense stability wise.
Maybe I'm misunderstanding your issue. Internally, Elasticsearch stores dates as epoch milliseconds. My assumption was that what you are seeing is caused by documents not having the exact same value for createdAt. As a result, those documents all get a different score.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.