Fault tolerance "address" for ES Cluster

tickermcse76 · August 12, 2015, 3:04pm

Suppose I have 5 ES nodes in a cluster; call them ES1, ES2, ES3, ES4, and ES5. I also have a farm of 20 web servers (call them W01 to W20) that need to point to this cluster. What address should the web servers point to such that if any 1 or 2 of the ES nodes goes down the web servers can be serviced by one of the remaining 3 or 4 ES nodes that are still active?

Example; suppose the configuration is to have W01-W05 point to ES1.... W15-W20 point to ES5. But in the event that ES5 goes down, W15-W20 would not be directed to the other nodes. How can the environment be configured such that W15-W20 could be directed to ES1-ES4 in the event ES5 fails or is taken offline?

nirmalc · August 12, 2015, 4:48pm

You can point to any node in cluster. I would have few client only nodes in cluster and have webservers connect randomly to one of client only nodes ( optional ) . And yes - you could retry another node in case one of the node is not reachable.

Thanks
Nirmal

tickermcse76 · August 12, 2015, 7:08pm

nirmalc - Thanks for your reply.

Randomly connecting to a node does not solve the issue of potentially landing on a node that could be offline/unavailable. So the solution would be to build a retry function into my application? While that might "work" it seems a poor solution for a production environment where performance is critical.

magnusbaeck · August 12, 2015, 7:17pm

Use HAProxy or similar to front your ES machines, probe machines to discover when they're down, and deflect traffic away from unavailable nodes.

colings86 · August 13, 2015, 7:36am

Or use one of the official language clients. The language clients all monitor the nodes for you to detect when nodes become unavailable and route their requests to another node in the cluster

mosiddi · August 14, 2015, 12:04pm

We have our caller code in .NET and this client [http://nest.azurewebsites.net/] does the load balancing / fault tolerance well.

tickermcse76 · August 26, 2015, 5:07pm

I reviewed the Elasticsearch.Net client as an example.

http://nest.azurewebsites.net/elasticsearch-net/connecting.html

Here instead of directly passing node, we pass a SniffingConnectionPool which will use our node to find out the rest of the available cluster nodes

var node = new Uri("http://mynode.example.com:8082/apiKey");
var connectionPool = new SniffingConnectionPool(new { node });
var config = new ConnectionConfiguration(connectionPool);
var client = new ElasticsearchClient(config);

Since "SniffingConnectionPool" still uses the defined "node" to discover the rest of the cluster, that still appears to be a single point of failure. In their example if http://mynode.example.com:8082 is unavailable, you will not have knowledge of other nodes in the cluster that may be available.

tickermcse76 · August 26, 2015, 5:46pm

http://nest.azurewebsites.net/elasticsearch-net/cluster-failover.html

According to that link, the SniffingConnectionPool setting will query the seedURIs you provide; and not the "node". This will work, but the way I interpreted this page and other are conflicting.

SniffingConnectionPool

This IConnectionPool implementation will sniff the cluster state on the passed seed nodes to find all the alive nodes in the cluster. It will round robin requests over all the alive nodes it knows about.
var pool = new SniffingConnectionPool(seedUris);

Ankit_Singhal · July 7, 2016, 1:31am

Hi @magnusbaeck.. Is there any tutorial available for configuring HAProxy in front of Elastic machines?

magnusbaeck · July 7, 2016, 5:36am

Is there any tutorial available for configuring HAProxy in front of Elastic machines?

None that I know of but any HAProxy material should do. I doubt Elasticsearch is so special it warrants any specific documentation.

Please create your own thread for any follow-up questions.

Topic		Replies	Views
ES cluster Elasticsearch	2	438	November 7, 2017
ElasticSearch Load Balancing failover on Node Failure Elasticsearch	1	1030	December 19, 2016
ElasticSearch cluster for DC failure tolerance Elasticsearch	3	404	November 12, 2020
Automatic fail-over when one node is inaccessible Elasticsearch	1	365	November 2, 2018
Multi node fail Elasticsearch	16	892	September 8, 2020

Fault tolerance "address" for ES Cluster

Related topics