New, RHEL based, 5-node elasticsearch cluster installed here (ver 7.13.3).
Elasticsearch clients are Logstash and some In-house applications.
My main concern is node failure.
I know that logstash would load balance connections to elasticsearch (pipeline output section's 'hosts' variable contains all node names) but what about other applications?
Do I need to configure an external load balancer (like haproxy or nginx) in front of elasticsearch so other applications can point to it instead of pointing to one node (or even an array of nodes)? Or is any other/better solution?
How are you connecting your other applications to Elasticsearch? All Elasticsearch clients have a feature called sniffing, for example the java one here Usage | Java REST Client [7.13] | Elastic
I don't know how applications are developed (or what language(s) are used) but I'll point developers to that sniffing feature. It seams very useful and if I understood correctly, after an initial successful connect application should know available nodes and can also control when to update node list (even in case of failure).
I assume the reason for not finding much info on this subject (external load balancer) is because applications should somehow take care of it, right?
A load balancer would work as well, but that is another component you have to run and maintain yourself, which also sounds like a single point of failure. Given the fact that the official clients support sniffing, I personally prefer running less infrastructure with a bit more logic within the application, but that is subjective I guess
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.