Is your environment sharded? If your documents are in different shards, the
same terms might have different IDF values. If you have just a few test
documents, scores will be more consistent in a single shard environment.
Also, in your last query, the first document probably has a higher value
because of length normalization. By default, a shorter field will be more
relevant than a longer field, even if the term frequency is the same:
Thank you for your help. I found out that IDF is the reason for this strange results. I think I was using only 1 shard the whole time. It's possible that the same term will get different IDF values even if only 1 shard is used?
The IDF should be the same for each document in the shard. The IDF
scores/weighs the search term, not the document. It sounds like documents
are in different shards. You can view the number of shards in the GET
settings API:
Elasticsearch has a default of five shards. If you read the explanation, it
will tell you which shard the document hit came from.
You can also change the search type to do a distributed query then fetch.
There is a performance hit since it needs to do another network roundtrip
between the coordinating node and the data nodes, but I think it is slight.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.