Case 1:
I have 5 node cluster, 2 data nodes and 3 master nodes. If one data node goes down permanently and I add a different node which contains a different IP.
Will the unassigned shards to assigned to the newly added node?
To achieve the above, what are the configuration changes that I am supposed to do? Do I have to mention the IP of the newly added node in discovery.zen.ping.unicast.hosts?
Case 2 :
Suppose I have 3 node cluster, two data nodes and one master node. Let's say it has around 500 shards in inclusion of primary and replica shards.
When I add a new node,will the shards be rebalanced, by rebalance I mean will the existing shards be assigned to newly added node?
No, you only need to list the addresses of the master-eligible nodes in discovery.zen.ping.unicast.hosts.
Yes.
(assuming a default configuration, i.e. you have nothing in your configuration that stops the shards being allocated to the new node, such as an allocation filtering rule, and you haven't disabled rebalancing)
These are normal settings like circuit breakers.
One question out of curiousity:
I have around 1300 shards in my 3 node cluster(2 data and 1 master).
I added fourth node, shards did et rebalance but a very few number of shards i.e., around 20 shards
only got assigned to new node.
Although specifications were similar , why is there a imbalance in assigned number of shards ?
1300 shards sounds like a lot for such a small cluster. This article gives some guidance.
Aim to keep the average shard size between at least a few GB and a few tens of GB. For use-cases with time-based data, it is common to see shards between 20GB and 40GB in size. A good rule-of-thumb is to ensure you keep the number of shards per node below 20 per GB heap it has configured. A node with a 30GB heap should therefore have a maximum of 600 shards, but the further below this limit you can keep it the better.
Recovery speed is limited to ensure that the cluster remains stable, and only occurs if the cluster health is green. My guess is that the rebalancing was still ongoing, or else the cluster stopped being green.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.