I'm testing my environment, curretnly with 2 nodes (2 different machines), and 3-4 nodes in the future (for something like a billion records).
Now, from what i saw so far, I need to run the secondary node, and only then run the primary node (If i started the primary node and then the other node, they didn't succeed to communicate)
so my question is, if now i want to add a third machine to the cluster, do i need to take down the primary node?
or is there a way that everything will keep runing, and i'll just run the third machine and it will be ok?
Now, from what i saw so far, I need to run the secondary node, and only then run the primary node (If i started the primary node and then the other node, they didn't succeed to communicate)
That's not how it's supposed to behave, but without details it's not possible to make any suggestions.
When you say "primary node", are you talking about the master node?
or is there a way that everything will keep runing, and i'll just run the third machine and it will be ok?
Nodes can be added and removed from an ES cluster without having to bring down any of the existing nodes.
yeah, I ment master node
I will try it again today, and if needed i'll post the logs
another related question - let's say i have now a billion records over 2 machines in the cluster.
after adding the 3rd machine, i need to re-index the data? or is there another way the data will rearrange itself?
The answer is, it depends on how many shards (and replicas) your index(es) have.
Every index is divided into 1 or more shards. By default I think you get 5 shards.
The shard count cannot be changed for an existing index.
Each shard has by definition one primary copy. It can (should) also have 1 to N replicas (think, redundant copies).
It is typical to maintain 1 or 2 replicas (or in some cases more) of every shard. Then when a shard is lost (node goes offline, maybe forever), service is not interrupted. New replicas are automatically remade.
The replica count can be changed for an existing index. It can be set low during initial indexing, then increased once indexing is complete, for example.
To answer your question,
By default, ES will attempt to balance all the shards (primaries and their replicas) across the available data nodes. Unless disabled, shards will autobalance according to reasonably conservative and appropriate defaults.
So: unless your index was built with only one shard, one would expect shards to migrate when a new data node joins the cluster.
(Data node means: a node that is allowed to hold shard data. In large cluster typically master nodes do not hold data. It is also possible to have client nodes, which handle queries but do not hold data or act as masters. More types are coming...)
There are a lot of settings related to this you will want to understand and tune appropriately for your index and use case!
node-233 is configured either as a master or a master+data node (I think by default, it's configured as a master+data node) with specific network.host --> based on what you described, it behaves like a master node than a data node (nothing is wrong with that) Since node-18 is a data node (because node.master is set to false), you don't need to include its IP:PORT in the discovery.zen.ping.unicast.hosts
node-18 is configured to be a data node with network.host of 0.0.0.0 (which is okay, it means ES will listen on all interfaces, not just one) and discovery.zen.ping.multicast.enabled is enabled --> in the config file, you need to add discovery.zen.ping.unicast.hosts parameter and point it to node-233, set discovery.zen.ping.multicast.enabled to false
With these two nodes, you can let them both as a master+data node this way your cluster has two master nodes and two data nodes so you can set the discovery.zen.minimum_master_nodes to 2 to avoid the split brain issue.
By doing this, it allows you to easily add one or more nodes to your cluster in the future.
If you add another master+data node, the configuration is similar to these two nodes, you can also add the new node's IP:PORT to the discovery.zen.ping.unicast.hosts parameter and turn it on. As long as it is configured to have the same cluster name, it should be able to join the existing cluster. To update existing node, shutdown one, update the config then turn it back on; work on the next one. You don't have to shutdown the entire cluster while doing this. Only shutdown the one that you need to modify then bring it back once done.
If you add another data node, set node.master to false, set discovery.zen.ping.unicast.hosts with IP:PORT of all master nodes, set discovery.zen.ping.multicast.enabled to false, use the same cluster name and turn it on. It should join the existing cluster automatically and ES will start load balancing the cluster for you. You don't have to do anything, just let ES handle it... sit back and watch its magic
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.