We recently upgraded our cluster from 6.8 to 7.4.2 and are now facing a major stability issue: the cluster randomly loses its master then is unable to re-elect a new one.
Our current configuration is:
- 6 nodes on distinct hosts, all master-eligible
- java heap size 8GB
- about 200 open indices and 300 closed ones
- relevant elasticsearch.yml settings (identical on all nodes apart from data.path and network.host):
Here are the logs (taken for the same period) from 2 different nodes - I can upload logs from the other machines if needed:
Occasionally the cluster forms again but loses its master shortly afterwards.
We have tried many combinations of settings, which did not seem to make any difference.
Any help will be greatly appreciated