Adding a new node to a cluster

I'm testing my environment, curretnly with 2 nodes (2 different machines), and 3-4 nodes in the future (for something like a billion records).
Now, from what i saw so far, I need to run the secondary node, and only then run the primary node (If i started the primary node and then the other node, they didn't succeed to communicate)

so my question is, if now i want to add a third machine to the cluster, do i need to take down the primary node?
or is there a way that everything will keep runing, and i'll just run the third machine and it will be ok?

Now, from what i saw so far, I need to run the secondary node, and only then run the primary node (If i started the primary node and then the other node, they didn't succeed to communicate)

That's not how it's supposed to behave, but without details it's not possible to make any suggestions.

When you say "primary node", are you talking about the master node?

or is there a way that everything will keep runing, and i'll just run the third machine and it will be ok?

Nodes can be added and removed from an ES cluster without having to bring down any of the existing nodes.

yeah, I ment master node
I will try it again today, and if needed i'll post the logs

another related question - let's say i have now a billion records over 2 machines in the cluster.
after adding the 3rd machine, i need to re-index the data? or is there another way the data will rearrange itself?

The answer is, it depends on how many shards (and replicas) your index(es) have.

Every index is divided into 1 or more shards. By default I think you get 5 shards.

The shard count cannot be changed for an existing index.

Each shard has by definition one primary copy. It can (should) also have 1 to N replicas (think, redundant copies).

It is typical to maintain 1 or 2 replicas (or in some cases more) of every shard. Then when a shard is lost (node goes offline, maybe forever), service is not interrupted. New replicas are automatically remade.

The replica count can be changed for an existing index. It can be set low during initial indexing, then increased once indexing is complete, for example.

To answer your question,

By default, ES will attempt to balance all the shards (primaries and their replicas) across the available data nodes. Unless disabled, shards will autobalance according to reasonably conservative and appropriate defaults.

So: unless your index was built with only one shard, one would expect shards to migrate when a new data node joins the cluster. :slight_smile:

(Data node means: a node that is allowed to hold shard data. In large cluster typically master nodes do not hold data. It is also possible to have client nodes, which handle queries but do not hold data or act as masters. More types are coming...)

There are a lot of settings related to this you will want to understand and tune appropriately for your index and use case!

best regards,
aaron

so i stil having this problem...

i have two machines (nodes) - 233 and 18.

when i start 18 and then 233, it's works fine
when i start 233 and then 18, the nodes aren't communicate

233 log
233 Config
18 log
18 Config

The "all_shards_failed" is happening also in the opposite scenario, and after a few seconds it stops and the cluster health becoming green

Both nodes need to have a list of hosts for unicast configured. It seems like only the 223 node currently has got this.

From the configuration files,

  • node-233 is configured either as a master or a master+data node (I think by default, it's configured as a master+data node) with specific network.host --> based on what you described, it behaves like a master node than a data node (nothing is wrong with that) Since node-18 is a data node (because node.master is set to false), you don't need to include its IP:PORT in the discovery.zen.ping.unicast.hosts

  • node-18 is configured to be a data node with network.host of 0.0.0.0 (which is okay, it means ES will listen on all interfaces, not just one) and discovery.zen.ping.multicast.enabled is enabled --> in the config file, you need to add discovery.zen.ping.unicast.hosts parameter and point it to node-233, set discovery.zen.ping.multicast.enabled to false

With these two nodes, you can let them both as a master+data node this way your cluster has two master nodes and two data nodes so you can set the discovery.zen.minimum_master_nodes to 2 to avoid the split brain issue.

For configuring discovery.zen.minimum_master_nodes and more...
https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-discovery-zen.html

Read more about the split brain issue

By doing this, it allows you to easily add one or more nodes to your cluster in the future.

  • If you add another master+data node, the configuration is similar to these two nodes, you can also add the new node's IP:PORT to the discovery.zen.ping.unicast.hosts parameter and turn it on. As long as it is configured to have the same cluster name, it should be able to join the existing cluster. To update existing node, shutdown one, update the config then turn it back on; work on the next one. You don't have to shutdown the entire cluster while doing this. Only shutdown the one that you need to modify then bring it back once done.

  • If you add another data node, set node.master to false, set discovery.zen.ping.unicast.hosts with IP:PORT of all master nodes, set discovery.zen.ping.multicast.enabled to false, use the same cluster name and turn it on. It should join the existing cluster automatically and ES will start load balancing the cluster for you. You don't have to do anything, just let ES handle it... sit back and watch its magic :wink:

i thought about it, so i changed it but the error stil occured...

Great response. looks everything is working now...thanks!