Hello,
I'm trying to figure out why my documents are ranked the way they are, so here are two samples which are puzzling me.
The top result has docFreq = 45, docCount = 476 and weight(Synonym(descRu:лн descRu:ол descRu:олн) in 313).
The second one has docFreq = 64, docCount = 527 and weight(Synonym(descRu:лн descRu:ол descRu:олн) in 324).
It doesn't make sense for me because I expect the IDF to be the same since it's the same term.
It's likely that they came from different shards. Term frequencies and IDFs are computed on a shard-local basis, which allows the search to happen in a coordination-free environment (the shards don't have to talk to each other and can execute in parallel).
Generally, this works fine because there is "enough data" to smooth out the discrepancies in TF/IDF, and scoring ends up being similar. But with few documents, or documents that aren't randomly distributed (e.g. using custom routing) you can run into more severe differences.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.