I have a number of remote instances of Logstash that send logs over https to an Elasticsearch cluster.
I'm looking to find a way to loadbalance the requests across the nodes in the cluster ideally without having to specify separate subdomains for each and list them.
I.e.
I'd have https://elk.mycluster.com with 2 DNS A records that point to 123.123.123.1 and 123.123.123.2 and the DNS provider will round-robin the requests across each.
Not sure it could work in your setup, but you could provide Logstash with a single host and use the sniffing parameter to have it auto-populate the list with all the hosts participating in that cluster.
Then Logstash would take care of the load-balancing itself.
Off the top of my head, the 2 major concerns would be:
Having dedicated master-eligible nodes in your cluster.
Those nodes cannot handle bulk indexing requests but they are still included in the hosts list produced by the sniffing parameter.
However, you need to explicitly define some nodes as master-eligible, so if you haven't, your nodes are most likely both master + data nodes and you should be fine.
I'm not sure whether the sniffing option will return the IP of each node or it's hostname. If the latter, you should still need DNS resolution, so it might not be much easier it is than your initial idea.
Edit: By "initial idea" I mean having separate DNS records for each node and passing them onto the hosts lists. In any case you don't need any for of load balancing in front of your cluster, since Logstash will do that itself for all provided hosts.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.