Elasticsearch on Kubernetes losing Cluster Connection every hour

fewagewasd · June 15, 2020, 7:58am

Hi,

I am migrating an older application using elasticsearch 2.4 (I know that's a very old version, but I'm not able to upgrade at this point) to Kubernetes. I set it up using a StatefulSet and a headless service, which basically works fine.
However, the connection between cluster nodes is lost after exactly one hour, reestablished, and then lost after an hour again.

There are no errors in the logs, just the messages from nodes joining and leaving the cluster:

[2020-06-15 04:11:21,652][INFO ][discovery.zen ] [Midas] master_left [{Isis}{2UKej6PQRya4WnEARlEZwA}{10.100.5.234}{10.100.5.234:9300}], reason [transport disconnected]
[2020-06-15 04:11:21,652][WARN ][discovery.zen ] [Midas] master left (reason = transport disconnected), current nodes: {{Stunner}{QNE6jkMyR8KqVmRLwbIWhA}{10.100.4.254}{10.100.4.254:9300},{Midas}{66GU3k9BRqGQ2PRAAxnfmQ}{10.100.3.243}{10.100.3.243:9300},}
[2020-06-15 04:11:21,652][INFO ][cluster.service ] [Midas] removed {{Isis}{2UKej6PQRya4WnEARlEZwA}{10.100.5.234}{10.100.5.234:9300},}, reason: zen-disco-master_failed ({Isis}{2UKej6PQRya4WnEARlEZwA}{10.100.5.234}{10.100.5.234:9300})
[2020-06-15 04:11:51,676][INFO ][cluster.service ] [Midas] detected_master {Isis}{2UKej6PQRya4WnEARlEZwA}{10.100.5.234}{10.100.5.234:9300}, added {{Isis}{2UKej6PQRya4WnEARlEZwA}{10.100.5.234}{10.100.5.234:9300},}, reason: zen-disco-receive(from master [{Isis}{2UKej6PQRya4WnEARlEZwA}{10.100.5.234}{10.100.5.234:9300}])
[2020-06-15 04:13:20,776][INFO ][cluster.service ] [Midas] removed {{Stunner}{QNE6jkMyR8KqVmRLwbIWhA}{10.100.4.254}{10.100.4.254:9300},}, reason: zen-disco-receive(from master [{Isis}{2UKej6PQRya4WnEARlEZwA}{10.100.5.234}{10.100.5.234:9300}])
[2020-06-15 04:13:50,794][INFO ][cluster.service ] [Midas] added {{Stunner}{QNE6jkMyR8KqVmRLwbIWhA}{10.100.4.254}{10.100.4.254:9300},}, reason: zen-disco-receive(from master [{Isis}{2UKej6PQRya4WnEARlEZwA}{10.100.5.234}{10.100.5.234:9300}])
[2020-06-15 05:11:51,660][INFO ][discovery.zen ] [Midas] master_left [{Isis}{2UKej6PQRya4WnEARlEZwA}{10.100.5.234}{10.100.5.234:9300}], reason [transport disconnected]
[2020-06-15 05:11:51,660][WARN ][discovery.zen ] [Midas] master left (reason = transport disconnected), current nodes: {{Stunner}{QNE6jkMyR8KqVmRLwbIWhA}{10.100.4.254}{10.100.4.254:9300},{Midas}{66GU3k9BRqGQ2PRAAxnfmQ}{10.100.3.243}{10.100.3.243:9300},}

I have found some other threads suggesting to reduce the tcp keepalive settings on the nodes, unfortunately, that didn't help in my case.

Does anyone know what causes this and/or what I can do to fix this?

Thanks!

fewagewasd · June 16, 2020, 8:23am

Turns out Istio was the culprit. I ignored the transport port in envoy by setting

      annotations:
        "traffic.sidecar.istio.io/excludeInboundPorts": "9300"
        "traffic.sidecar.istio.io/excludeOutboundPorts": "9300"

Now the cluster connection is stable.

system · July 14, 2020, 8:23am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elasticsearch cluster is down Elasticsearch	5	1316	December 13, 2019
Cluster breaks very frequently Elasticsearch docker	10	310	December 1, 2022
ES nodes disconnects intermittently from the cluster Elasticsearch	1	633	February 8, 2018
Elasticsearch endpoint giving http 504 error Elasticsearch	2	355	April 19, 2023
Unable to communicate between the two elasticsearch clusters of different datacenters Elasticsearch	3	1772	May 22, 2018

Elasticsearch on Kubernetes losing Cluster Connection every hour

Related topics