If I have all my elasticsearch instances automatically joined to a load balancer pool (actually a kubernetes service) on start up, would it cause any problems to point the unicast discovery hosts to just the load balancer (kubernetes service) address? This seems to work fine, but I wanted to check that there wasn't something I'm missing.
Well it should work as long as the LB doesn't fall over
It works yes but do you see any cons to do this instead to configure all available masters as unicast hosts ?
If the LB fall, will it affect only new nodes ? Already clustered nodes don't use discovery settings as soon as they joined the cluster?
The unicast hosts used to exchange information between new nodes of the cluster. If the cluster is already formed, a new node can ping any node and ask it for the current master. However, if you have a full cluster restart, where all the new nodes are pinging and discovering each other, a LB can (in theory) lead them to miss nodes. For example, if consistently separates the nodes into two halves, those two halves will never discover each other. Since you have mentioned using dedicated master nodes, I would suggest just putting those in the list. Last, since you talk about kubernetes, maybe it has an API which can give the list of current container IPs? If so you can write a little discovery plugin to ingest that list as a unicast host seed (see how the GCE/EC2 plugins work now).
@bleskes Thanks for that info. I've written a plugin to do exactly as you said: discover cluster nodes from the kubernetes API (https://github.com/fabric8io/elasticsearch-cloud-kubernetes). We were seeing if we could get rid of that & just point discovery to the service load balancer but as you've alluded too it is better to find individual masters.