The problem you are experiencing is due to distributed search. The IDF
values are calculated per shard, so scores can change depending on which
shard the document is located on. If you notice, the documents with the
same score are all on the same shard.
This problem normally manifests when you have a low number of documents and
a few or more shards. If you had millions of documents the problem will be
less.
One option is to use a distributed query:
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-search-type.html#dfs-query-then-fetch
There is a slight performance but, but it should help with the problem.
Cheers,
Ivan