Adaptive Replica Selection - deeper details

Glen_Smith · May 13, 2021, 10:05pm

Hello,

If possible, I'd like to learn more specifically how shard ranking is calculated for a search.

The current documentation for Adaptive Replica Selection says

By default, Elasticsearch uses adaptive replica selection to route search requests. This method selects an eligible node using shard allocation awareness and the following criteria:

Response time of prior requests between the coordinating node and the eligible node

How long the eligible node took to run previous searches

Queue size of the eligible node’s search threadpool

Do the first and second items have some overlap? Or is the first item the communication latency of the inter-node request and the second item a metric provided by the data node for the duration of search? Or are they something else that could be more precisely expressed? (Also, as stated, the first item seems to include latency of indexing requests, too. Is that the case?)

Are the first two items scoped down at all? That is, does the coordinating node have those criteria on a shard-by-shard or index-by-index basis, or is it for all requests sent to the candidate nodes?

If a coordinating node has determined that there are 8 participating shards for a given search, as it iterates through the shards determining which shard copy of each should be searched, does the selection for the first 7 influence the choice for the 8th (by, say, incrementing the apparent queue size for the destination nodes)? Or maybe it doesn't have to because those searches have been dispatched already and the search queue size will already reflect the new state?

Thanks in advance for any insight you can provide.

vincenbr · May 14, 2021, 7:40pm

Hi Glen,
I think this blog post pretty much answers your questions :
[Improving Response Latency in Elasticsearch with Adaptive Replica Selection | Elastic Blog].
I don't think there is a shard-level ranking (which primary or replica of a specific shard is the "best performing" among my cluster) but rather a node-level ranking, based on statistics of previously dispatched searches. And AFAIU only search requests are taken into account (not indexing requests, which are not necessarily on par with search requests' performance).
When I want to troubleshoot unbalanced search load on the cluster, I use GET _nodes/stats/adaptive_selection which returns the metrics and node ranking for ARS.

Glen_Smith · May 17, 2021, 3:24pm

Beautiful! Thanks for the details!

system · June 14, 2021, 3:24pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Adaptive replica selection in ES version 8 Elasticsearch	2	396	December 11, 2020
Does Elasticsearch Data Node full of replicas routing request to other nodes? Elasticsearch	8	715	September 14, 2020
Query phase behavior Elasticsearch	2	370	December 19, 2019
Question about some details for Query Phase Elasticsearch	4	444	May 7, 2020
Replica Selection in Elasticsearch Elasticsearch	8	1320	July 5, 2017

Adaptive Replica Selection - deeper details

Related topics