Cluter removed timeout Coordinating node

Hello, everyone!
We have 15 nodes in our ES cluster, including 3 master nodes, 9 data nodes, and 3 coordinate/client nodes.
There are 3 physical hosts in the cluster, and 5 ES nodes are deployed on each host. (1 master, 1 client/coordinate, 3 data) Each host has 46 CPUs and 512 GB of RAM.
Every day 3 client nodes randomly leave the cluster and automatically join it again after 10 minutes or so. During the time of the problem, there were operations doing queries and writes, but there were not many requests and the hosts had more than enough resources, so there was no resource shortage.
We have been pinging and the network is fine, no packet loss.
Do you have any friends who have encountered similar problems?

This is the coordination node log

[2022-12-29T01:28:50,400][INFO ][o.e.d.z.ZenDiscovery ] [xxxx-001-kzx_client] master_left [{xxxx-003-kzx_master}{Qxixi5PtQbOVI9lUOz94nA}{RozHaYueRsCPj1X_4DxEuA}{xxxx.40}{xxxx.40:9300}{xpack.installed=true}], reason [failed to ping, tried [3] times, each with maximum [30s] timeout]
[2022-12-29T01:28:50,401][WARN ][o.e.d.z.ZenDiscovery ] [xxxx-001-kzx_client] master left (reason = failed to ping, tried [3] times, each with maximum [30s] timeout)

This is the master node log

Hi there @chengdihua and welcome!

You're using version 6.8.5 which became unsupported years ago. It definitely had bugs which could lead to these symptoms. You should upgrade to a supported version ASAP.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.