Score and relevance across the shards


(konstantin) #1

Hi guys,

I need help to clarify how the ES merge scores across different
shards. As I know each shard is a Lucene index instance. Thus each
index has its own tf-idf normalization that is independent from other
shards. So, here is the first question. Are the score depends on how
data distributed across the shards? Second, are the rank of documents
depends on how data distributed across the shards?Third, what is the
algorithm for merging scores across the shards?

Sorry if my questions are vague. Just ask me and I'll clarify the
points.
Thanks!


(Shay Banon) #2

Hey, by default, the scores are used as is from each shard, and sorted by
it. So, the scores will use the tf-idf as it is defined on each shard, which
means it will depend on how data is distributed.

If you want, you can set the search type to have a DFS phase that would go
and do aggregated frequencies, but that will mean slower search. See here:
http://www.elasticsearch.org/guide/reference/api/search/search-type.html.

On Tue, Sep 13, 2011 at 4:14 PM, konstantin
konstantin.selivanov@gmail.comwrote:

Hi guys,

I need help to clarify how the ES merge scores across different
shards. As I know each shard is a Lucene index instance. Thus each
index has its own tf-idf normalization that is independent from other
shards. So, here is the first question. Are the score depends on how
data distributed across the shards? Second, are the rank of documents
depends on how data distributed across the shards?Third, what is the
algorithm for merging scores across the shards?

Sorry if my questions are vague. Just ask me and I'll clarify the
points.
Thanks!


(system) #3