Loadbalancer in front of Elasticsearch

Hello,

I've got a 5 data, 3 master node setup. I understand that elasticsearch has built in load balancing capability, and that I could just route all the traffic to one of the nodes and call it a day.

I've seen several articles talking about setting up a load balancer infront of elasticsearch to increase availability, in case the node I'm routing traffic to goes down.

I can't find anything about this in the official documentation. Can anyone chime in on the best setup from a performance perspective?

Also would love to hear if anyone knows of how I would actually scientifically test the performance of these different setups..

If you are using one of the official language clients (strongly recommended) then it can 'sniff' your cluster and locate all the nodes when you connect so that in the case that the node it was using to connect to the cluster goes down it can switch to another node. In addition it is recommended to give the clients a few (maybe 3) nodes as a seed list in case one of the nodes is down when it first tries to connect before its completed the 'sniffing'. This removes the need for a load balancer in front of Elasticsearch to deal with node failures.

hope that helps

We use HAproxy and RoundRobin forward rest calls to the client nodes http port and java_transport client code calls to the transfer ports.

As mentioned the official clients deal with this pretty well but also be aware that if you send all search/indexing traffic through a single node (and that node is also a data node) it will be carrying a disproportionate load and if you have a high volume of searches the node will be at elevated risk of heap related issues. It's a very good idea to spread everything around if at all possible.

Kimbro

Thanks, I decided to go with this approach, seems to be working fine.