Distributing primary shards?


(Maxwell Flanders) #1

Hey there! I have one particularly high throughput logging cluster which processes about 1tb of data per day. Most of that data goes into a single index with 25 shards (which exactly matches the number of nodes in the cluster, intentionally).

I was looking at the cluster today and I noticed that the primary shards aren't actually distributed evenly across nodes in the cluster - some nodes have 2 primary shards, and some nodes have 0. Wouldn't the best configuration for maximum indexing speed be to have one primary shard per node?? There are almost no other relevant indices on that cluster.

Is there a setting I can set to tell elasticsearch not to allow two primary shards from the same index to live on the same node??


Unbalanced primary shards affects index performance?
Distributing primary shards evenly for read primary_first performance
(Yannick Welsch) #2

Primaries and replicas do approximately the same amount of work when indexing so there is no need to balance the primaries.


(Imran Siddique) #3

Nodes with primaries handle the indexing request and they follow-up with other nodes to ensure the replication completes. So, in a way they take more n/w resource and communication burden. No?


(Yannick Welsch) #4

Nodes with primaries handle the indexing request and they follow-up with other nodes to ensure the replication completes. So, in a way they take more n/w resource and communication burden. No?

Technically yes, but practically no :wink: The communication burden is often negligible. There are only very rare cases where primary balance would bring a little bit more performance. Establishing and keeping primary balance comes at a cost as well though, as more shard shuffling needs to be done by the balancer when a node fails.


(Imran Siddique) #5

Yup. You are right.


(Maxwell Flanders) #6

Wait so as long as the TOTAL of primaries and replicas for an index is distributed evenly across all my nodes, I can consider my cluster to be essentially balanced is what you are saying??


(Yannick Welsch) #7

Elasticsearch takes two properties into account for balancing shards:

  • Having approximately the same number of shards on each node (independently of the index they belong to).
  • Spreading the shards of the same index across the nodes (this is the one you mentioned).

As these goals can sometimes conflict, ES provides a setting to influence the importance of one goal over the other, see here:

https://www.elastic.co/guide/en/elasticsearch/reference/current/shards-allocation.html#_shard_balancing_heuristics


(Yannick Welsch) #8

Also note that neither of the balancing properties I've just mentioned distinguish between primary or replica shards, they're treated as equal w.r.t. balancing.


(system) #9

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.