If I have all my elasticsearch instances automatically joined to a load balancer pool (actually a kubernetes service) on start up, would it cause any problems to point the unicast discovery hosts to just the load balancer (kubernetes service) address? This seems to work fine, but I wanted to check that there wasn't something I'm missing.
The unicast hosts used to exchange information between new nodes of the cluster. If the cluster is already formed, a new node can ping any node and ask it for the current master. However, if you have a full cluster restart, where all the new nodes are pinging and discovering each other, a LB can (in theory) lead them to miss nodes. For example, if consistently separates the nodes into two halves, those two halves will never discover each other. Since you have mentioned using dedicated master nodes, I would suggest just putting those in the list. Last, since you talk about kubernetes, maybe it has an API which can give the list of current container IPs? If so you can write a little discovery plugin to ingest that list as a unicast host seed (see how the GCE/EC2 plugins work now).
@bleskes Thanks for that info. I've written a plugin to do exactly as you said: discover cluster nodes from the kubernetes API (https://github.com/fabric8io/elasticsearch-cloud-kubernetes). We were seeing if we could get rid of that & just point discovery to the service load balancer but as you've alluded too it is better to find individual masters.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.