Would a node failure take down the cluster?

Trying to troubleshoot why a 3 Node Master-data cluster would go down. Mainly looking for direction on where to look for issues and any possible solutions

Currently running version 7.0.1
Indices - 19
Max Memory - 38.8 GB
Total Shards - 38
Documents - 8,136,787
Data Size - 507.2 GB

Going through the logs, I found a lot of the following exceptions that seem to chain together.

  • CircuitBreakingException: Data too large, data for [<transport_request>]
  • Cluster health status changed from [GREEN] to [YELLOW] failed to list shard for shard_store on node
  • AlreadyClosedException engine is closed

Thank you

The 7.0.x series is pretty old and passes the end of its supported life tomorrow. There have been a good number of resiliency improvements since its release. Before digging deeper I suggest you upgrade to the latest version.

1 Like

Thanks, will recommend upgrading. Going to see if I can get a server to test upgrading.