Hi everyone,
I have an Elasticsearch (7.10.2) cluster with 11 cluster nodes (3 master, 8 data).
Randomly there are data nodes that start to disconnect from the cluster. The node is healthy but offline, it comes back online after about 15 minutes.
Seeing different discussions on the subject, configure the TCP Keepalive:
net.ipv4.tcp_keepalive_time = 600
net.ipv4.tcp_keepalive_intvl = 60
net.ipv4.tcp_keepalive_probes = 20
Trying this configuration a few days the same problem occurred again.
What is in the logs of the nodes that are disconnecting? Given that you are using version 7.10.2, do you have any third party plugins installed that could affect the cluster behaviour?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.