One ELK Node is Down

Dear All,

I have ELK cluster consists of three nodes elk01, elk02, elk03. One node elk01 is suddenly down. When I check the logs /var/log/elasticsearch/elasticsearch.log of elk01, I found these errors:

"[elk01] Authentication of [elastic] was terminated by realm [reserved] - failed to authenticate user [elastic]"
"[elk01] Authentication of [kibana_system] was terminated by realm [reserved] - failed to authenticate user [kibana_system]"

I did the following troubleshooting:

  • I restarted elasticsearch service in all nodes.
  • I can do telnet on port 9200 and 9300 on all nodes.
  • I tried to reset elastic password on node elk01 using this command

/user/share/elasticsearch/bin/elasticsearch-reset-password -i -u elastic

but I am getting the below error:

Error: Failed to determine the health of the cluster.

  • I set xpack.security.enabled: false in elasticsearch.yml in all nodes, restart elasticsearch, and tried again the above commands but I can't reset the password either.

  • From elk02 and elk03 nodes, I can get the indices status using curl http://elk02:9200/_cat/indices?v and all indices have green status.

Note: The cluster was working fine. This issue is suddenly appeared without making any changes in the configuration.

In elasticsearch.log of elk01, I found that

[elk01] failed to join {elk03}... Caused by: ... index [.monitoring-es-7-2023.09.14...] version not supported: 8.5.2 the node version is: 8.5.0

That means the node elk01 can't join the cluster because of elk01 has been installed with elasticsearch version 8.5.0 while elk02 & elk03 have been installed with elasticsearch version 8.5.2 which means the issue is relaved to version incompitablity.

I upgrade elk01 with elasticsearch 8.5.2 and that solve the problem.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.