Occasionally NoAliveNodesFound with HAProxy as Loadbalancer

oliver.hart · June 13, 2023, 3:45pm

Hello,

we have kind of a special problem, so I try to explain everything in detail.

Setup

The above diagram shows our current setup (simplified).

Our app runs within a Kubernetes / Openshift Cluster. The Deployment is scaled so we have multiple Pods of one app.
From the K8s Cluster the Request goes to the Firewall, this Firewall (FW1) is via IPsec tunnel to another datacenter connected (No outgoing traffic block).

The Firewall in DC 2 (FW2) allows traffic on Port 443 and 9200. It also runs an HAProxy instance which handles TLS termination and loadbalancing for the 3 Elastic Nodes.

The traffic from HAProxy through the Elastic Nodes is default HTTPS traffic on Port 9200.

Problem

Occasionally we get the following Error in some apps:

Elasticsearch\Common\Exceptions\NoNodesAvailableException

No alive nodes found in your cluster

We handle million of Request per Hour put only 1-2% throw this error. If it happens it only occurs in 1 of maybe 5 Pods of the same application. Sometimes multiple apps have this problem an the same time, but other times only one of them has it.

Tried fixes

We have tried increasing the connection timeout and reading timeout.
We searched through the Elasticsearch Logs and also tried to manually reproduce the issue.

We also changed the connection Pool from staticNoPingConnectionPool to the normal staticConnectionPool:

$client = ClientBuilder::create()
    ->setConnectionPool('\Elasticsearch\ConnectionPool\StaticConnectionPool', [])
    ->build();

Nothing worked.

After we configured our applications to connect directly to one of the nodes without Loadbalancer the errors have stopped.

We have searched the last days through the Internet (Elastic Discourse, Github Repos, HAProxy Forum, Reddit and half of Google) for any solution.

We are currently completely out of ideas.

Thanks in advanced for any Help.

oliver.hart · June 19, 2023, 3:00pm

For everyone who comes across this post and has the same Problem:

The solution was the change of the connection Pool.

warkolm · June 20, 2023, 4:30am

Thanks for sharing that solution!

system · July 18, 2023, 4:31am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elastic Search Cluster behind Loadbalancer Elasticsearch	8	1795	July 6, 2017
[v5.6.1 java transport client] org.elasticsearch.client.transport.NoNodeAvailableException: None of the configured nodes were available Elasticsearch	2	1301	October 24, 2017
Issue with NoNodeAvailableException[None of the configured nodes are available: Elasticsearch	9	5533	July 5, 2017
NoNodeAvailableException problem Elasticsearch	6	2014	July 28, 2017
NoNodeException Error when trying to create a transport client for Elasticsearch Elasticsearch	10	3192	July 5, 2017

Occasionally NoAliveNodesFound with HAProxy as Loadbalancer

Related topics