Explain API - All these statistics are per shard not per index

I am going through this thread

Somewhere it is said

All these statistics are per shard not per index

I hope this statement is still valid , and i confirmed this behavior too ..

What was the reasoning behind this to make the statistics per shard ( not per index) which defeat the purpose of distributed systems as when it supposed to be calculating the relevance in global by looking across all shards

Relevance in Elasticsearch is calculated per shard, not per index.

Thanks a lot, I also mentioned the same. Would be interested know what was the reasoning behind that instead of calculating it per index ?

Because an index is not necessarily on the same node, which means you would need to ship all the documents to a single node to relatively score them.

In practise(considering realtime scenarios and distributed nature of elastic earch), the documents would spread across the nodes - i guess it is the whole purpose of distributed systems like ES.

However, the following statement might defeat the whole purpose of distributed nature

you would need to ship all the documents to a single node to relatively score them.

Exactly :slight_smile:

Not sure you understand my question - what is the reasoning behind doing it per shard ?

I think we're going in circles here :wink:

As Elasticsearch is a distributed system, you cannot guarantee that all shards of a given index will be on the same node when a query is processed. If you wanted to do it on an index level, you would need to ship all data from the index to a single node to calculate the relevance.

So you either have a single monolithic system to have index level scoring, or a distributed one with shard level scoring. There's costs and benefits to both.

ok, you mean it is more of operational complexity to do global relevance scoring across all shards. So - No other reason , not to do global relevance scoring :slight_smile:

Have a look at the different search types available, especially dfs_query_then_fetch.

thanks for checking , is it any thing different than what @warkolm(https://discuss.elastic.co/u/warkolm) referring to ? #justasking

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.