Does Elasticsearch Data Node full of replicas routing request to other nodes?

vorapoap · August 14, 2020, 10:23am

If the cluster has 4 nodes and the index has 4 primary shards.
Each shard has 3 replicas, it means each node contains all shard data.

When search query is sent to a node,. does it route query to other nodes? or it depends?
Comparing to each node has only one single shard without replica.. which one is faster in term of search speed and indexing speed?

spinscale · August 14, 2020, 10:27am

See the preference parameter of a search request, which also explains the default behaviour:

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-your-data.html#search-preference

Steve_Mushero · August 14, 2020, 10:48am

You can test with the query [from these docs](You can test with this query, which if you send to your first node you can see which shards and nodes it'll use: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-shards.html), which if you send to your first node you can see which shards and nodes it'll use.

Generally unless you target a document ID/shard or other ways to route to a single shard, the coordinating node (the one your client talks to) must do a scatter/gather operation and send the query to all of an index's shards

The question is how it chooses, in your case among the 4 copies of each shard - this is called Adaptive Replica Selection with this nice doc on that.

My reading of that is that among other things it uses past performance as an input, so in theory of all queries were local and fast, it might pick the local shards and not bother with other shards.

But the queue matters, also so if the local node is busy, it may send to other nodes.

vorapoap · August 14, 2020, 4:04pm

So the cluster set up that each node contains all the shards seems to have better search speed than each node contain single shard.. may we say that? (Considering the cluster has only one index)

vorapoap · August 14, 2020, 4:53pm

Oh I found something interesting.. with ?_explain=true. you will easily see that result will come from different nodes.... hmm this is interesting.

Steve_Mushero · August 15, 2020, 1:59am

Yeah, it's pretty dynamic and expects to run under heavy load and shifting queues, etc. so it likely moves around like any distributed dynamic system. I imagine it can oscillate also since this node may be loaded and slow so it uses other nodes, then this node gets fast so uses this node, etc. but on balance, it all balances out.

I'm writing a blog on search data flow and will dig into the code to see how it does this.

vorapoap · August 17, 2020, 4:36am

It is very interesting to know... for example
if we have 16 data nodes containing one single index
A. 8 primary shards / 1 replica each
B. 8 primary shards / 15 replica each

which one would perform faster?
I would guess.. indexing speed is slower on B due to many replicas.
But what about search speed...since the search query doesn't always perform on the single server...

Cheers

Christian_Dahlqvist · August 17, 2020, 5:08am

As the number of primary shards is the same here I would guess A would be faster as it has less data meaning most of it may be cached in memory.

If the data set is small enough to be cached in both scenarios I would also test a scenario C with 1 primary shard and 15 replica shards.

It is always a balance between query latency and query throughput (number of concurrent queries that can be served) and you likely need to benchmark to find out what is best for your particular use case.

system · September 14, 2020, 5:09am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Are queries routed to the primary OR a replica, or the primary AND all replicas? Elasticsearch	2	492	October 25, 2018
Search performance improovment by adding replicas shards Elasticsearch	4	1452	December 19, 2019
ElasticSearch number of shards queried Elasticsearch	14	905	May 14, 2019
Elasticsearch replica shard distribution Elasticsearch	3	898	August 31, 2017
Adaptive Replica Selection - deeper details Elasticsearch	3	1146	June 14, 2021

Does Elasticsearch Data Node full of replicas routing request to other nodes?

Related topics