I have 3 node cluster (3 VM with different IPs) . After going thru some documentations and other community questions, I read that in small cluster like if 3, to have redundancy it is good to make all Master as well as Data node. So, if 1 node goes down, other 2 can vote and make Master. Question I have are for my 3 node (10.1.1.21, 10.1.1.22, 10.1.1.23)
What will happen if my 2 node goes down. How I can achieve redundancy for the same?
Which IP I have to give in fluentd/logstash to receive data? Can we have some VIP or cluster name. I saw that we can give multiple IPs in "hosts" but I think it is not a good way to do. If 1 more node will be added in future, all configuration in agents needs to be changed.
From which IP we can run Kibana? Let's say I installed Kibana on 10.1.1.21 and in yaml file defined all 3 IPs under elasticsearch.hosts: (as per link below). But what will happen if 10.1.1.21 itself is down? How we can access it, do person has to change node manually or we can have cluster/VIP? Use Kibana in a production environment | Kibana Guide [7.13] | Elastic
I also read about voting exclusion but not sure if my 2 node goes down, how we can still continue to receive the data with 1 node. Or is there any way to achieve it?
An Elasticsearch cluster always require a strict majority of master-eligible nodes to be available in order to be fully functional. This means that you can only afford to lose 1 node in a 3 node cluster.
This is a good way to do it. If the process sending data to Elasticsearch supports sniffing you can use that to discover new nodes.
If you want Kibana to be highly available you need it installed on multiple nodes and allow it to connect to multiple nodes in the cluster.
But installing Kibana too on all 3 nodes, will require people to change IP while accessing it. However, it should happen automatically if we give any Cluster IP or cluster name where IP at backend can change according to availability.
Same goes for fluentd/logstash. To give single virtual IP and on backend it should send data to the avaialble node.
You can create this in the environment of the user running logstash, in the file /etc/sysconfig/logstash or use a logstash keystore.
I currently use the file /etc/sysconfig/logstash for my logstash variables, the logstash keystore is a good choice too, but you can't automate it as it does not support stdin inputs with spaces when creating keys. In any case, you would need to restart logstash to refresh the variables.
For Kibana, if you have three Kibana instances you could use a VIP IP or a load balancer, you can create a VIP IP easy using keepalived or you can try to use nginx or haproxy as load balancer.
I am using fluentd to sent data to Elasticsearch directly. But if I give just 1 IP and that server got down then all write operations will be stopped. So, I was checking some option to give Cluster name here or some cluster IP, which automatically detect the running server and start sending it's data to that.
Replica of shards will then be made to other 2 nodes.
For Kibana, let me try by making VIP and installing Kibana on all 3 nodes. So, you recommend in Kibana.yml, hosts I have to define respective locahost:9090 (elasticsearch) or all 3 hosts of ES in all instances of Kibana?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.