I need help to clarify how the ES merge scores across different
shards. As I know each shard is a Lucene index instance. Thus each
index has its own tf-idf normalization that is independent from other
shards. So, here is the first question. Are the score depends on how
data distributed across the shards? Second, are the rank of documents
depends on how data distributed across the shards?Third, what is the
algorithm for merging scores across the shards?
Sorry if my questions are vague. Just ask me and I'll clarify the
points.
Thanks!
Hey, by default, the scores are used as is from each shard, and sorted by
it. So, the scores will use the tf-idf as it is defined on each shard, which
means it will depend on how data is distributed.
I need help to clarify how the ES merge scores across different
shards. As I know each shard is a Lucene index instance. Thus each
index has its own tf-idf normalization that is independent from other
shards. So, here is the first question. Are the score depends on how
data distributed across the shards? Second, are the rank of documents
depends on how data distributed across the shards?Third, what is the
algorithm for merging scores across the shards?
Sorry if my questions are vague. Just ask me and I'll clarify the
points.
Thanks!
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.