Fault tolerance "address" for ES Cluster

Suppose I have 5 ES nodes in a cluster; call them ES1, ES2, ES3, ES4, and ES5. I also have a farm of 20 web servers (call them W01 to W20) that need to point to this cluster. What address should the web servers point to such that if any 1 or 2 of the ES nodes goes down the web servers can be serviced by one of the remaining 3 or 4 ES nodes that are still active?

Example; suppose the configuration is to have W01-W05 point to ES1.... W15-W20 point to ES5. But in the event that ES5 goes down, W15-W20 would not be directed to the other nodes. How can the environment be configured such that W15-W20 could be directed to ES1-ES4 in the event ES5 fails or is taken offline?

You can point to any node in cluster. I would have few client only nodes in cluster and have webservers connect randomly to one of client only nodes ( optional ) . And yes - you could retry another node in case one of the node is not reachable.

Thanks
Nirmal

nirmalc - Thanks for your reply.

Randomly connecting to a node does not solve the issue of potentially landing on a node that could be offline/unavailable. So the solution would be to build a retry function into my application? While that might "work" it seems a poor solution for a production environment where performance is critical.

Use HAProxy or similar to front your ES machines, probe machines to discover when they're down, and deflect traffic away from unavailable nodes.

Or use one of the official language clients. The language clients all monitor the nodes for you to detect when nodes become unavailable and route their requests to another node in the cluster

We have our caller code in .NET and this client [http://nest.azurewebsites.net/] does the load balancing / fault tolerance well.

I reviewed the Elasticsearch.Net client as an example.

http://nest.azurewebsites.net/elasticsearch-net/connecting.html

Here instead of directly passing node, we pass a SniffingConnectionPool which will use our node to find out the rest of the available cluster nodes

var node = new Uri("http://mynode.example.com:8082/apiKey");
var connectionPool = new SniffingConnectionPool(new { node });
var config = new ConnectionConfiguration(connectionPool);
var client = new ElasticsearchClient(config);

Since "SniffingConnectionPool" still uses the defined "node" to discover the rest of the cluster, that still appears to be a single point of failure. In their example if http://mynode.example.com:8082 is unavailable, you will not have knowledge of other nodes in the cluster that may be available.

http://nest.azurewebsites.net/elasticsearch-net/cluster-failover.html

According to that link, the SniffingConnectionPool setting will query the seedURIs you provide; and not the "node". This will work, but the way I interpreted this page and other are conflicting.

SniffingConnectionPool

This IConnectionPool implementation will sniff the cluster state on the passed seed nodes to find all the alive nodes in the cluster. It will round robin requests over all the alive nodes it knows about.
var pool = new SniffingConnectionPool(seedUris);

Hi @magnusbaeck.. Is there any tutorial available for configuring HAProxy in front of Elastic machines?

Is there any tutorial available for configuring HAProxy in front of Elastic machines?

None that I know of but any HAProxy material should do. I doubt Elasticsearch is so special it warrants any specific documentation.

Please create your own thread for any follow-up questions.