Distributing primary shards?

Maxwell_Flanders · December 1, 2016, 8:29pm

Hey there! I have one particularly high throughput logging cluster which processes about 1tb of data per day. Most of that data goes into a single index with 25 shards (which exactly matches the number of nodes in the cluster, intentionally).

I was looking at the cluster today and I noticed that the primary shards aren't actually distributed evenly across nodes in the cluster - some nodes have 2 primary shards, and some nodes have 0. Wouldn't the best configuration for maximum indexing speed be to have one primary shard per node?? There are almost no other relevant indices on that cluster.

Is there a setting I can set to tell elasticsearch not to allow two primary shards from the same index to live on the same node??

ywelsch · December 2, 2016, 8:52am

Primaries and replicas do approximately the same amount of work when indexing so there is no need to balance the primaries.

mosiddi · December 2, 2016, 9:09am

Nodes with primaries handle the indexing request and they follow-up with other nodes to ensure the replication completes. So, in a way they take more n/w resource and communication burden. No?

ywelsch · December 2, 2016, 9:21am

Nodes with primaries handle the indexing request and they follow-up with other nodes to ensure the replication completes. So, in a way they take more n/w resource and communication burden. No?

Technically yes, but practically no The communication burden is often negligible. There are only very rare cases where primary balance would bring a little bit more performance. Establishing and keeping primary balance comes at a cost as well though, as more shard shuffling needs to be done by the balancer when a node fails.

mosiddi · December 2, 2016, 10:45am

Yup. You are right.

Maxwell_Flanders · December 2, 2016, 4:00pm

Wait so as long as the TOTAL of primaries and replicas for an index is distributed evenly across all my nodes, I can consider my cluster to be essentially balanced is what you are saying??

ywelsch · December 2, 2016, 4:20pm

Elasticsearch takes two properties into account for balancing shards:

Having approximately the same number of shards on each node (independently of the index they belong to).
Spreading the shards of the same index across the nodes (this is the one you mentioned).

As these goals can sometimes conflict, ES provides a setting to influence the importance of one goal over the other, see here:

https://www.elastic.co/guide/en/elasticsearch/reference/current/shards-allocation.html#_shard_balancing_heuristics

ywelsch · December 2, 2016, 4:22pm

Also note that neither of the balancing properties I've just mentioned distinguish between primary or replica shards, they're treated as equal w.r.t. balancing.

system · December 30, 2016, 4:22pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Distributing primary shards evenly for read primary_first performance Elasticsearch	5	1925	February 19, 2018
Primary balancing Elasticsearch	17	571	July 6, 2017
Shard balancing questions Elasticsearch	8	381	March 16, 2019
Balancing primary shards Elasticsearch	1	310	July 6, 2017
Primary shards not balanced across nodes for elasticsearch 5.2 Elasticsearch	2	739	August 23, 2018

Distributing primary shards?

Related topics