Explain API - All these statistics are per shard not per index

nages · April 28, 2020, 1:51pm

I am going through this thread

Somewhere it is said

All these statistics are per shard not per index

I hope this statement is still valid , and i confirmed this behavior too ..

What was the reasoning behind this to make the statistics per shard ( not per index) which defeat the purpose of distributed systems as when it supposed to be calculating the relevance in global by looking across all shards

warkolm · April 28, 2020, 10:03pm

Relevance in Elasticsearch is calculated per shard, not per index.

nages · April 29, 2020, 2:21am

Thanks a lot, I also mentioned the same. Would be interested know what was the reasoning behind that instead of calculating it per index ?

warkolm · April 29, 2020, 2:46am

Because an index is not necessarily on the same node, which means you would need to ship all the documents to a single node to relatively score them.

nages · April 29, 2020, 3:10am

In practise(considering realtime scenarios and distributed nature of elastic earch), the documents would spread across the nodes - i guess it is the whole purpose of distributed systems like ES.

However, the following statement might defeat the whole purpose of distributed nature

you would need to ship all the documents to a single node to relatively score them.

warkolm · April 29, 2020, 3:20am

Exactly

nages · April 29, 2020, 4:13am

Not sure you understand my question - what is the reasoning behind doing it per shard ?

warkolm · April 29, 2020, 4:32am

I think we're going in circles here

As Elasticsearch is a distributed system, you cannot guarantee that all shards of a given index will be on the same node when a query is processed. If you wanted to do it on an index level, you would need to ship all data from the index to a single node to calculate the relevance.

So you either have a single monolithic system to have index level scoring, or a distributed one with shard level scoring. There's costs and benefits to both.

nages · April 29, 2020, 11:01am

ok, you mean it is more of operational complexity to do global relevance scoring across all shards. So - No other reason , not to do global relevance scoring

Christian_Dahlqvist · April 29, 2020, 11:19am

Have a look at the different search types available, especially dfs_query_then_fetch.

nages · April 29, 2020, 11:23am

thanks for checking , is it any thing different than what @warkolm(https://discuss.elastic.co/u/warkolm) referring to ? #justasking

system · May 27, 2020, 11:23am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Per Shard Statistics Elasticsearch	4	1146	July 6, 2017
What does “docCount” and "docFreq" mean in the Explain API? Elasticsearch	8	3076	February 12, 2019
Records per shard Elasticsearch	7	1006	July 6, 2017
Number of results per shard Elasticsearch	5	371	May 13, 2020
Clarification needed on stats Elasticsearch	3	978	July 6, 2017

Explain API - All these statistics are per shard not per index

Related topics