Shard balancing questions

Raimon_Bosch · February 15, 2019, 2:29pm

Hi,

I am wondering what is the advantage of having primary shards spread amongst several nodes. Since you can use multithreading, it is not more effective to handle all the details of the same search query from the same node?

https://www.elastic.co/guide/en/elasticsearch/reference/current/shards-allocation.html#_shard_balancing_heuristics

Which would be the consequences on designing a policy that concentrates all the primary shards on the same node? Bear in mind that each index will go to a different node and you will keep you replicas away in another node.

Thanks in advance,

dadoonet · February 15, 2019, 2:41pm

Instead you can use preference=_local. See https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-preference.html

Raimon_Bosch · February 15, 2019, 2:52pm

I'm not sure if that will work because in this use case it is needed to throw the query in all the shards to have an accurate result. I was more looking for a sharding policy that puts all the primary nodes on the same node when possible. Maybe it's best to not use shards at all.

dadoonet · February 16, 2019, 8:23am

It will query all shards even if some of them are not available locally. It will just have a preference for the local ones.

But I'm wondering if it's a just question you have or a real problem?
I mean that you should not really try to change the default behavior unless you have a real problem. Do you?

Raimon_Bosch · February 16, 2019, 8:50am

Ok, I'll try the param. The idea is to benchmark several configurations before going live.

Cheers,

DavidTurner · February 16, 2019, 10:25am

I think you are perhaps confused about the role of primary shards. From the point of view of a search they're just another shard copy, no different from any other replica. The main difference between primaries and replicas is that primaries perform a small amount of extra coordination when indexing a document. Using _preference=local will try and keep a search on the local node regardless of whether the shards on the local node are primaries or replicas.

Raimon_Bosch · February 16, 2019, 10:42am

An algorithm that keeps the N shards of an index in the same node works for us. Even if all are primary, replica or mixed.

I guess that for recovery purposes I would keep all the primaries together in one node, and its replicas together in another. So the probability of losing the data is less high.

So from that point of view, my interest has nothing to do with primaries or replicas, just with the fact of performing the search in multithread in the same node instead of multithread across several nodes. But anyway, maybe Elastic is designed so the latencies are negligible for this situation.

DavidTurner · February 16, 2019, 12:22pm

The primary/replica distinction doesn't make any difference in terms of data durability either.

If CPU were the limiting factor then it probably would be better to stay on a single node using lots of threads, but there are other things like I/O bandwidth that don't scale with the number of processors and this is simplistically why it can be better to use multiple nodes. Network latency isn't normally a major concern, but if it is then you can use shard allocation awareness to try and keep searches within a single zone (e.g. node, or rack) where the latency is best.

system · March 16, 2019, 12:22pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Distributing primary shards? Elasticsearch	8	9027	December 30, 2016
Shard allocation Elasticsearch	7	27	September 30, 2024
When to prefer replica shards over primary shards Elasticsearch	3	735	January 31, 2017
Balancing primary shards Elasticsearch	1	310	July 6, 2017
Set primary shards location Elasticsearch	5	1250	July 5, 2017

Shard balancing questions

Related topics