Red cluster status - nodes restarting and unallocated shards


A few days ago, my Elastic cluster status turned to red for reasons that are unclear to me. I received several emails over the last 48 hours saying "We restarted a node in your cluster after it ran out of memory.", but open checking the memory allocation in my nodes, I didn't see anything that looked critical (the memory usage on one of them is high, but not critically so). I assume the shards went unallocated because of the node restarts?

I've attached a screenshot of my monitoring dashboard.

I ran
GET /_cluster/allocation/explain

and got the following response:

  "index" : "dvlp-repeat-prescriptions-2018.06.28",
  "shard" : 3,
  "primary" : false,
  "current_state" : "unassigned",
  "unassigned_info" : {
    "reason" : "NODE_LEFT",
    "at" : "2019-08-05T08:37:58.421Z",
    "details" : "node_left [jtfGNFgOTS2OK0-A2Bee9w]",
    "last_allocation_status" : "no_attempt"
  "can_allocate" : "no",
  "allocate_explanation" : "cannot allocate because allocation is not permitted to any of the nodes",
  "node_allocation_decisions" : [
      "node_id" : -REDACTED-,
      "node_name" : "instance-0000000012",
      "transport_address" : -REDACTED-,
      "node_attributes" : {
        "logical_availability_zone" : "zone-1",
        "server_name" : -REDACTED-,
        "availability_zone" : "eu-west-1c",
        "xpack.installed" : "true",
        "region" : "eu-west-1",
        "instance_configuration" : "aws.highio.classic"
      "node_decision" : "no",
      "deciders" : [
          "decider" : "same_shard",
          "decision" : "NO",
          "explanation" : "the shard cannot be allocated to the same node on which a copy of the shard already exists [[dvlp-repeat-prescriptions-2018.06.28][3], node[qkfLTYSiQpGTDQgBF8FRuA], [P], s[STARTED], a[id=K8oCkFBOSVm6tVQxlCb62Q]]"

Any ideas of what the underlying problem is? I did set up Index Lifecycle Management for my log indices, could that somehow be tampering with the memory and/or shard allocation?


As an Elastic Cloud user with a Platinum licence, your best bet is to contact your support engineer. They will be able to see much more detailed information about what has happened with this cluster and will be able to guide you through bringing it back to health.

Ok thank you. I did submit a support request on Friday morning but still haven't heard back. I'll just sit tight then!

You have far too many shards given your heap size. I would recommend you reduce this significantly.

