I am stuck in a very weird situation.
My 3-node ES cluster is failing after 8-10 days abruptly with error:
[WARN ][o.e.c.c.ClusterFormationFailureHelper] [elasticsearch-0.es-service] this node is unhealthy: health check failed due to broken node lock
No change is being done during that duration. Neither any other instance of ES is being run
ES Version: 8.6.2
Deployment: 3-node cluster on k8s
Can anyone please suggest any solution or investigation points?
That means that Elasticsearch saw a change in its data directory for which it was not responsible, so it stops all write activity to protect your data. To fix it, remove any other process that might make such a change and then restart Elasticsearch.
Thanks for the reply!!
Can you also please provide some insights on how to identify other process which can change data directory?
Just to add, its a 3-node cluster with each node running on separate worker nodes & this issue occurs generally after 10-12 days of running cluster
Common culprits include misconfigured/buggy security scanners and backup tools, but it could be anything really. You'll need to work with your local sysadmin folks to pin it down.