Allocation Failed

fnitz · March 20, 2023, 4:13pm

Hi,

I've got many error messages like that:

{
  "index" : "logstash-prod_operations_clear-001098",
  "shard" : 0,
  "primary" : false,
  "current_state" : "unassigned",
  "unassigned_info" : {
    "reason" : "ALLOCATION_FAILED",
    "at" : "2023-03-07T09:23:48.390Z",
    "failed_allocation_attempts" : 5,
    "details" : "failed shard on node [8zemZfddQOm3iNFio-GgsA]: failed recovery, failure RecoveryFailedException[[logstash-prod_operations_clear-001098][0]: Recovery failed from {elastic-cold-com-15-rz1}{UTpMrVw1SW6PAJlgjzEbAg}{ng25sO6yTmyxVSoJighbZQ}{10.1.6.87}{10.1.6.87:9300}{cdfhrstw}{rz=rz1, xpack.installed=true, storage=hdd, transform.node=true} into {elastic-cold-com-16-rz2}{8zemZfddQOm3iNFio-GgsA}{7RT9r9OeQbKZ0qz7E7eECw}{10.2.6.87}{10.2.6.87:9300}{cdfhrstw}{xpack.installed=true, transform.node=true, rz=rz2, storage=hdd}]; nested: RemoteTransportException[[elastic-cold-com-15-rz1][10.1.6.87:9300][internal:index/shard/recovery/start_recovery]]; nested: CircuitBreakingException[[parent] Data too large, data for [internal:index/shard/recovery/start_recovery] would be [6363867618/5.9gb], which is larger than the limit of [6120328396/5.6gb], real usage: [6363866232/5.9gb], new bytes reserved: [1386/1.3kb], usages [request=8736/8.5kb, fielddata=144290/140.9kb, in_flight_requests=1386/1.3kb, model_inference=0/0b, accounting=578797672/551.9mb]]; ",
    "last_allocation_status" : "no_attempt"
  },
  "can_allocate" : "awaiting_info",
  "allocate_explanation" : "cannot allocate because information about existing shard data is still being retrieved from some of the nodes",
  "node_allocation_decisions" : [
    {
      "node_id" : "-LZu73W2TgOa87dU87Nx0A",
      "node_name" : "elastic-cold-com-22-rz2",
      "transport_address" : "10.2.6.91:9300",
      "node_attributes" : {
        "rz" : "rz2",
        "xpack.installed" : "true",
        "storage" : "hdd",
        "transform.node" : "true"
      },
      "node_decision" : "yes"
    },
    {
      "node_id" : "3jgTzoQdQJSpQaLaQhlMPg",
      "node_name" : "elastic-cold-com-4-rz2",
      "transport_address" : "10.2.6.78:9300",
      "node_attributes" : {
        "rz" : "rz2",
        "xpack.installed" : "true",
        "storage" : "hdd",
        "transform.node" : "true"
      },
      "node_decision" : "yes"
    },
    {
      "node_id" : "6fUpig1RQc-161P30ZG1CA",
      "node_name" : "elastic-cold-com-6-rz2",
      "transport_address" : "10.2.6.79:9300",
      "node_attributes" : {
        "rz" : "rz2",
        "xpack.installed" : "true",
        "storage" : "hdd",
        "transform.node" : "true"
      },
      "node_decision" : "yes"
    },
    {
      "node_id" : "8zemZfddQOm3iNFio-GgsA",
      "node_name" : "elastic-cold-com-16-rz2",
      "transport_address" : "10.2.6.87:9300",
      "node_attributes" : {
        "rz" : "rz2",
        "xpack.installed" : "true",
        "storage" : "hdd",
        "transform.node" : "true"
      },
      "node_decision" : "yes"
    },
    {
      "node_id" : "CE_ICelORTCTTNv8GzcICg",
      "node_name" : "elastic-cold-com-8-rz2",
      "transport_address" : "10.2.6.82:9300",
      "node_attributes" : {
        "rz" : "rz2",
        "xpack.installed" : "true",
        "storage" : "hdd",
        "transform.node" : "true"
      },
      "node_decision" : "yes"
    },
    {
      "node_id" : "P3qZHvSQTReIUUtUw-J6iA",
      "node_name" : "elastic-cold-com-2-rz2",
      "transport_address" : "10.2.6.77:9300",
      "node_attributes" : {
        "rz" : "rz2",
        "xpack.installed" : "true",
        "storage" : "hdd",
        "transform.node" : "true"
      },
      "node_decision" : "yes"
    },
    {
      "node_id" : "WTFqNa4zTh-dHsp7aryo2w",
      "node_name" : "elastic-cold-com-18-rz2",
      "transport_address" : "10.2.6.88:9300",
      "node_attributes" : {
        "rz" : "rz2",
        "xpack.installed" : "true",
        "storage" : "hdd",
        "transform.node" : "true"
      },
      "node_decision" : "yes"
    },
    {
      "node_id" : "Wv3DS25qTimNewVFmS-L_A",
      "node_name" : "elastic-cold-com-10-rz2",
      "transport_address" : "10.2.6.84:9300",
      "node_attributes" : {
        "rz" : "rz2",
        "xpack.installed" : "true",
        "storage" : "hdd",
        "transform.node" : "true"
      },
      "node_decision" : "yes"
    },
    {
      "node_id" : "diEioxR1RRSHP8GEg9Vn8g",
      "node_name" : "elastic-cold-com-12-rz2",
      "transport_address" : "10.2.6.85:9300",
      "node_attributes" : {
        "rz" : "rz2",
        "xpack.installed" : "true",
        "storage" : "hdd",
        "transform.node" : "true"
      },
      "node_decision" : "yes"
    },
    {
      "node_id" : "1d0fBM28Q2W9J3lAXg6BEA",
      "node_name" : "elastic-cold-com-20-rz2",
      "transport_address" : "10.2.6.89:9300",
      "node_attributes" : {
        "rz" : "rz2",
        "xpack.installed" : "true",
        "storage" : "hdd",
        "transform.node" : "true"
      },
      "node_decision" : "throttled",
      "deciders" : [
        {
          "decider" : "throttling",
          "decision" : "THROTTLE",
          "explanation" : "reached the limit of incoming shard recoveries [2], cluster setting [cluster.routing.allocation.node_concurrent_incoming_recoveries=2] (can also be set via [cluster.routing.allocation.node_concurrent_recoveries])"
        }
      ]
    },
    {
      "node_id" : "KLV3tbJvRmGC8GdgpXV0vQ",
      "node_name" : "elastic-cold-com-14-rz2",
      "transport_address" : "10.2.6.86:9300",
      "node_attributes" : {
        "rz" : "rz2",
        "xpack.installed" : "true",
        "storage" : "hdd",
        "transform.node" : "true"
      },
      "node_decision" : "throttled",
      "deciders" : [
        {
          "decider" : "throttling",
          "decision" : "THROTTLE",
          "explanation" : "reached the limit of incoming shard recoveries [2], cluster setting [cluster.routing.allocation.node_concurrent_incoming_recoveries=2] (can also be set via [cluster.routing.allocation.node_concurrent_recoveries])"
        }
      ]
    },
    {
      "node_id" : "4aylJLz8SJ2P64ciJVtPtg",
      "node_name" : "elastic-cold-com-9-rz1",
      "transport_address" : "10.1.6.84:9300",
      "node_attributes" : {
        "rz" : "rz1",
        "xpack.installed" : "true",
        "storage" : "hdd",
        "transform.node" : "true"
      },
      "node_decision" : "no",
      "deciders" : [
        {
          "decider" : "awareness",
          "decision" : "NO",
          "explanation" : "there are too many copies of the shard allocated to nodes with attribute [rz], there are [2] total configured shard copies for this shard id and [2] total attribute values, expected the allocated shard count per attribute [2] to be less than or equal to the upper bound of the required number of shards per attribute [1]"
        }
      ]
    },
    {
      "node_id" : "7nMPX7b-T8KDLWIkqQ8afg",
      "node_name" : "elastic-cold-com-13-rz1",
      "transport_address" : "10.1.6.86:9300",
      "node_attributes" : {
        "rz" : "rz1",
        "xpack.installed" : "true",
        "storage" : "hdd",
        "transform.node" : "true"
      },
      "node_decision" : "no",
      "deciders" : [
        {
          "decider" : "awareness",
          "decision" : "NO",
          "explanation" : "there are too many copies of the shard allocated to nodes with attribute [rz], there are [2] total configured shard copies for this shard id and [2] total attribute values, expected the allocated shard count per attribute [2] to be less than or equal to the upper bound of the required number of shards per attribute [1]"
        }
      ]
    },
    {
      "node_id" : "9pz0HbzKQhi05UW9Zpb-ng",
      "node_name" : "elastic-hot-com-6-rz2",
      "transport_address" : "10.2.6.74:9300",
      "node_attributes" : {
        "rz" : "rz2",
        "xpack.installed" : "true",
        "storage" : "ssd",
        "transform.node" : "true"
      },
      "node_decision" : "no",
      "deciders" : [
        {
          "decider" : "filter",
          "decision" : "NO",
          "explanation" : """node does not match index setting [index.routing.allocation.require] filters [storage:"hdd"]"""
        }
      ]
    },
    {
      "node_id" : "H-HREaclQce2POdNnx7-MQ",
      "node_name" : "elastic-cold-com-11-rz1",
      "transport_address" : "10.1.6.85:9300",
      "node_attributes" : {
        "rz" : "rz1",
        "xpack.installed" : "true",
        "storage" : "hdd",
        "transform.node" : "true"
      },
      "node_decision" : "no",
      "deciders" : [
        {
          "decider" : "awareness",
          "decision" : "NO",
          "explanation" : "there are too many copies of the shard allocated to nodes with attribute [rz], there are [2] total configured shard copies for this shard id and [2] total attribute values, expected the allocated shard count per attribute [2] to be less than or equal to the upper bound of the required number of shards per attribute [1]"
        }
      ]
    },
    {
      "node_id" : "JivunSuJRfucH1OkjOsxWw",
      "node_name" : "elastic-cold-com-3-rz1",
      "transport_address" : "10.1.6.78:9300",
      "node_attributes" : {
        "rz" : "rz1",
        "xpack.installed" : "true",
        "storage" : "hdd",
        "transform.node" : "true"
      },
      "node_decision" : "no",
      "deciders" : [
        {
          "decider" : "awareness",
          "decision" : "NO",
          "explanation" : "there are too many copies of the shard allocated to nodes with attribute [rz], there are [2] total configured shard copies for this shard id and [2] total attribute values, expected the allocated shard count per attribute [2] to be less than or equal to the upper bound of the required number of shards per attribute [1]"
        }
      ]
    },
    {
      "node_id" : "NH8YRPyZS-OzJVs-Qn8wCg",
      "node_name" : "elastic-cold-com-5-rz1",
      "transport_address" : "10.1.6.79:9300",
      "node_attributes" : {
        "rz" : "rz1",
        "xpack.installed" : "true",
        "storage" : "hdd",
        "transform.node" : "true"
      },
      "node_decision" : "no",
      "deciders" : [
        {
          "decider" : "awareness",
          "decision" : "NO",
          "explanation" : "there are too many copies of the shard allocated to nodes with attribute [rz], there are [2] total configured shard copies for this shard id and [2] total attribute values, expected the allocated shard count per attribute [2] to be less than or equal to the upper bound of the required number of shards per attribute [1]"
        }
      ]
    },
    {
      "node_id" : "TggXs0TRQx2MRqYjve7YHw",
      "node_name" : "elastic-cold-com-21-rz1",
      "transport_address" : "10.1.6.90:9300",
      "node_attributes" : {
        "rz" : "rz1",
        "xpack.installed" : "true",
        "storage" : "hdd",
        "transform.node" : "true"
      },
      "node_decision" : "no",
      "deciders" : [
        {
          "decider" : "disk_threshold",
          "decision" : "NO",
          "explanation" : "the node is above the low watermark cluster setting [cluster.routing.allocation.disk.watermark.low=90%], using more disk space than the maximum allowed [90.0%], actual free: [7.483555394733263%]"
        },
        {
          "decider" : "awareness",
          "decision" : "NO",
          "explanation" : "there are too many copies of the shard allocated to nodes with attribute [rz], there are [2] total configured shard copies for this shard id and [2] total attribute values, expected the allocated shard count per attribute [2] to be less than or equal to the upper bound of the required number of shards per attribute [1]"
        }
      ]
    },
    {
      "node_id" : "TndIMiQSTS-hJkapNLJHSw",
      "node_name" : "elastic-hot-com-4-rz2",
      "transport_address" : "10.2.6.76:9300",
      "node_attributes" : {
        "rz" : "rz2",
        "xpack.installed" : "true",
        "storage" : "ssd",
        "transform.node" : "true"
      },
      "node_decision" : "no",
      "deciders" : [
        {
          "decider" : "filter",
          "decision" : "NO",
          "explanation" : """node does not match index setting [index.routing.allocation.require] filters [storage:"hdd"]"""
        }
      ]
    },
    {
      "node_id" : "Tp-57eKqTByni5z3Sy8_Aw",
      "node_name" : "elastic-cold-com-17-rz1",
      "transport_address" : "10.1.6.88:9300",
      "node_attributes" : {
        "rz" : "rz1",
        "xpack.installed" : "true",
        "storage" : "hdd",
        "transform.node" : "true"
      },
      "node_decision" : "no",
      "deciders" : [
        {
          "decider" : "awareness",
          "decision" : "NO",
          "explanation" : "there are too many copies of the shard allocated to nodes with attribute [rz], there are [2] total configured shard copies for this shard id and [2] total attribute values, expected the allocated shard count per attribute [2] to be less than or equal to the upper bound of the required number of shards per attribute [1]"
        }
      ]
    },
    {
      "node_id" : "UTpMrVw1SW6PAJlgjzEbAg",
      "node_name" : "elastic-cold-com-15-rz1",
      "transport_address" : "10.1.6.87:9300",
      "node_attributes" : {
        "rz" : "rz1",
        "xpack.installed" : "true",
        "storage" : "hdd",
        "transform.node" : "true"
      },
      "node_decision" : "no",
      "deciders" : [
        {
          "decider" : "same_shard",
          "decision" : "NO",
          "explanation" : "a copy of this shard is already allocated to this node [[logstash-prod_operations_clear-001098][0], node[UTpMrVw1SW6PAJlgjzEbAg], [P], s[STARTED], a[id=ZjhFWijkTuuhoRiBUDsDAA]]"
        },
        {
          "decider" : "throttling",
          "decision" : "THROTTLE",
          "explanation" : "reached the limit of incoming shard recoveries [2], cluster setting [cluster.routing.allocation.node_concurrent_incoming_recoveries=2] (can also be set via [cluster.routing.allocation.node_concurrent_recoveries])"
        },
        {
          "decider" : "awareness",
          "decision" : "NO",
          "explanation" : "there are too many copies of the shard allocated to nodes with attribute [rz], there are [2] total configured shard copies for this shard id and [2] total attribute values, expected the allocated shard count per attribute [2] to be less than or equal to the upper bound of the required number of shards per attribute [1]"
        }
      ]
    },
    {
      "node_id" : "bN-qQ0iXTEm7zSMXX3jVYg",
      "node_name" : "elastic-cold-com-7-rz1",
      "transport_address" : "10.1.6.82:9300",
      "node_attributes" : {
        "rz" : "rz1",
        "xpack.installed" : "true",
        "storage" : "hdd",
        "transform.node" : "true"
      },
      "node_decision" : "no",
      "deciders" : [
        {
          "decider" : "awareness",
          "decision" : "NO",
          "explanation" : "there are too many copies of the shard allocated to nodes with attribute [rz], there are [2] total configured shard copies for this shard id and [2] total attribute values, expected the allocated shard count per attribute [2] to be less than or equal to the upper bound of the required number of shards per attribute [1]"
        }
      ]
    },
    {
      "node_id" : "dpuOVN5qR2qlGHTT3RXylQ",
      "node_name" : "elastic-cold-com-23-rz1",
      "transport_address" : "10.1.6.91:9300",
      "node_attributes" : {
        "rz" : "rz1",
        "xpack.installed" : "true",
        "storage" : "hdd",
        "transform.node" : "true"
      },
      "node_decision" : "no",
      "deciders" : [
        {
          "decider" : "throttling",
          "decision" : "THROTTLE",
          "explanation" : "reached the limit of incoming shard recoveries [2], cluster setting [cluster.routing.allocation.node_concurrent_incoming_recoveries=2] (can also be set via [cluster.routing.allocation.node_concurrent_recoveries])"
        },
        {
          "decider" : "awareness",
          "decision" : "NO",
          "explanation" : "there are too many copies of the shard allocated to nodes with attribute [rz], there are [2] total configured shard copies for this shard id and [2] total attribute values, expected the allocated shard count per attribute [2] to be less than or equal to the upper bound of the required number of shards per attribute [1]"
        }
      ]
    },
    {
      "node_id" : "fkkU0UULQtO-ETf2QcN8ww",
      "node_name" : "elastic-cold-com-19-rz1",
      "transport_address" : "10.1.6.89:9300",
      "node_attributes" : {
        "rz" : "rz1",
        "xpack.installed" : "true",
        "storage" : "hdd",
        "transform.node" : "true"
      },
      "node_decision" : "no",
      "deciders" : [
        {
          "decider" : "awareness",
          "decision" : "NO",
          "explanation" : "there are too many copies of the shard allocated to nodes with attribute [rz], there are [2] total configured shard copies for this shard id and [2] total attribute values, expected the allocated shard count per attribute [2] to be less than or equal to the upper bound of the required number of shards per attribute [1]"
        }
      ]
    },
    {
      "node_id" : "luOxP-ExTi26KigJb8J0ng",
      "node_name" : "elastic-hot-com-3-rz1",
      "transport_address" : "10.1.6.76:9300",
      "node_attributes" : {
        "rz" : "rz1",
        "xpack.installed" : "true",
        "storage" : "ssd",
        "transform.node" : "true"
      },
      "node_decision" : "no",
      "deciders" : [
        {
          "decider" : "filter",
          "decision" : "NO",
          "explanation" : """node does not match index setting [index.routing.allocation.require] filters [storage:"hdd"]"""
        },
        {
          "decider" : "awareness",
          "decision" : "NO",
          "explanation" : "there are too many copies of the shard allocated to nodes with attribute [rz], there are [2] total configured shard copies for this shard id and [2] total attribute values, expected the allocated shard count per attribute [2] to be less than or equal to the upper bound of the required number of shards per attribute [1]"
        }
      ]
    },
    {
      "node_id" : "mPXUU9zNQrKic0sPxz7pVQ",
      "node_name" : "elastic-hot-com-2-rz2",
      "transport_address" : "10.2.6.75:9300",
      "node_attributes" : {
        "rz" : "rz2",
        "xpack.installed" : "true",
        "storage" : "ssd",
        "transform.node" : "true"
      },
      "node_decision" : "no",
      "deciders" : [
        {
          "decider" : "filter",
          "decision" : "NO",
          "explanation" : """node does not match index setting [index.routing.allocation.require] filters [storage:"hdd"]"""
        }
      ]
    },
    {
      "node_id" : "oDPOAMY2RQWmdpqDqONOzA",
      "node_name" : "elastic-hot-com-5-rz1",
      "transport_address" : "10.1.6.77:9300",
      "node_attributes" : {
        "rz" : "rz1",
        "xpack.installed" : "true",
        "storage" : "ssd",
        "transform.node" : "true"
      },
      "node_decision" : "no",
      "deciders" : [
        {
          "decider" : "filter",
          "decision" : "NO",
          "explanation" : """node does not match index setting [index.routing.allocation.require] filters [storage:"hdd"]"""
        },
        {
          "decider" : "awareness",
          "decision" : "NO",
          "explanation" : "there are too many copies of the shard allocated to nodes with attribute [rz], there are [2] total configured shard copies for this shard id and [2] total attribute values, expected the allocated shard count per attribute [2] to be less than or equal to the upper bound of the required number of shards per attribute [1]"
        }
      ]
    },
    {
      "node_id" : "w3GlzXS1T66aVrZJPftZ9A",
      "node_name" : "elastic-hot-com-1-rz1",
      "transport_address" : "10.1.6.75:9300",
      "node_attributes" : {
        "rz" : "rz1",
        "xpack.installed" : "true",
        "storage" : "ssd",
        "transform.node" : "true"
      },
      "node_decision" : "no",
      "deciders" : [
        {
          "decider" : "filter",
          "decision" : "NO",
          "explanation" : """node does not match index setting [index.routing.allocation.require] filters [storage:"hdd"]"""
        },
        {
          "decider" : "awareness",
          "decision" : "NO",
          "explanation" : "there are too many copies of the shard allocated to nodes with attribute [rz], there are [2] total configured shard copies for this shard id and [2] total attribute values, expected the allocated shard count per attribute [2] to be less than or equal to the upper bound of the required number of shards per attribute [1]"
        }
      ]
    }
  ]
}

Can someone give me an advice, what can I do?

DavidTurner · March 20, 2023, 4:48pm

You're using a relatively old version of ES, and in newer versions the message now reads as follows:

Elasticsearch is retrieving information about this shard from one or more
nodes. It will make an allocation decision after it receives this
information. Please wait.

As it says, you just have to wait.

fnitz · March 20, 2023, 4:59pm

Hey David,

thanks for the quick response.
But the problem is, when I start a request in elasticsearch "discover" it shows for over 4 weeks "shards failed" and I've got very often wrong or incomplete log results.

So I often wait for a long time, but ES shows wrong results.

Christian_Dahlqvist · March 20, 2023, 5:02pm

What is the full output of the cluster stats API?

fnitz · March 20, 2023, 5:05pm

Hi Christian,

{
  "_nodes" : {
    "total" : 31,
    "successful" : 25,
    "failed" : 6,
    "failures" : [
      {
        "type" : "failed_node_exception",
        "reason" : "Failed node [fkkU0UULQtO-ETf2QcN8ww]",
        "node_id" : "fkkU0UULQtO-ETf2QcN8ww",
        "caused_by" : {
          "type" : "circuit_breaking_exception",
          "reason" : "[parent] Data too large, data for [cluster:monitor/stats[n]] would be [6345608124/5.9gb], which is larger than the limit of [6120328396/5.6gb], real usage: [6345590496/5.9gb], new bytes reserved: [17628/17.2kb], usages [request=10176/9.9kb, fielddata=127728/124.7kb, in_flight_requests=17628/17.2kb, model_inference=0/0b, accounting=577140864/550.4mb]",
          "bytes_wanted" : 6345608124,
          "bytes_limit" : 6120328396,
          "durability" : "PERMANENT"
        }
      },
      {
        "type" : "failed_node_exception",
        "reason" : "Failed node [dpuOVN5qR2qlGHTT3RXylQ]",
        "node_id" : "dpuOVN5qR2qlGHTT3RXylQ",
        "caused_by" : {
          "type" : "circuit_breaking_exception",
          "reason" : "[parent] Data too large, data for [cluster:monitor/stats[n]] would be [6356287508/5.9gb], which is larger than the limit of [6120328396/5.6gb], real usage: [6356269880/5.9gb], new bytes reserved: [17628/17.2kb], usages [request=8960/8.7kb, fielddata=127134/124.1kb, in_flight_requests=17628/17.2kb, model_inference=0/0b, accounting=609030408/580.8mb]",
          "bytes_wanted" : 6356287508,
          "bytes_limit" : 6120328396,
          "durability" : "PERMANENT"
        }
      },
      {
        "type" : "failed_node_exception",
        "reason" : "Failed node [TggXs0TRQx2MRqYjve7YHw]",
        "node_id" : "TggXs0TRQx2MRqYjve7YHw",
        "caused_by" : {
          "type" : "circuit_breaking_exception",
          "reason" : "[parent] Data too large, data for [cluster:monitor/stats[n]] would be [6364564980/5.9gb], which is larger than the limit of [6120328396/5.6gb], real usage: [6364547352/5.9gb], new bytes reserved: [17628/17.2kb], usages [request=2832/2.7kb, fielddata=127794/124.7kb, in_flight_requests=19000/18.5kb, model_inference=0/0b, accounting=602872088/574.9mb]",
          "bytes_wanted" : 6364564980,
          "bytes_limit" : 6120328396,
          "durability" : "PERMANENT"
        }
      },
      {
        "type" : "failed_node_exception",
        "reason" : "Failed node [UTpMrVw1SW6PAJlgjzEbAg]",
        "node_id" : "UTpMrVw1SW6PAJlgjzEbAg",
        "caused_by" : {
          "type" : "circuit_breaking_exception",
          "reason" : "[parent] Data too large, data for [cluster:monitor/stats[n]] would be [6333789116/5.8gb], which is larger than the limit of [6120328396/5.6gb], real usage: [6333771488/5.8gb], new bytes reserved: [17628/17.2kb], usages [request=14824/14.4kb, fielddata=111174/108.5kb, in_flight_requests=17628/17.2kb, model_inference=0/0b, accounting=577661880/550.9mb]",
          "bytes_wanted" : 6333789116,
          "bytes_limit" : 6120328396,
          "durability" : "PERMANENT"
        }
      },
      {
        "type" : "failed_node_exception",
        "reason" : "Failed node [diEioxR1RRSHP8GEg9Vn8g]",
        "node_id" : "diEioxR1RRSHP8GEg9Vn8g",
        "caused_by" : {
          "type" : "circuit_breaking_exception",
          "reason" : "[parent] Data too large, data for [cluster:monitor/stats[n]] would be [6267052180/5.8gb], which is larger than the limit of [6120328396/5.6gb], real usage: [6267034552/5.8gb], new bytes reserved: [17628/17.2kb], usages [request=28384/27.7kb, fielddata=44611/43.5kb, in_flight_requests=19000/18.5kb, model_inference=0/0b, accounting=585680200/558.5mb]",
          "bytes_wanted" : 6267052180,
          "bytes_limit" : 6120328396,
          "durability" : "PERMANENT"
        }
      },
      {
        "type" : "failed_node_exception",
        "reason" : "Failed node [7nMPX7b-T8KDLWIkqQ8afg]",
        "node_id" : "7nMPX7b-T8KDLWIkqQ8afg",
        "caused_by" : {
          "type" : "circuit_breaking_exception",
          "reason" : "[parent] Data too large, data for [cluster:monitor/stats[n]] would be [6296061804/5.8gb], which is larger than the limit of [6120328396/5.6gb], real usage: [6296044176/5.8gb], new bytes reserved: [17628/17.2kb], usages [request=8656/8.4kb, fielddata=91200/89kb, in_flight_requests=17628/17.2kb, model_inference=0/0b, accounting=579387800/552.5mb]",
          "bytes_wanted" : 6296061804,
          "bytes_limit" : 6120328396,
          "durability" : "PERMANENT"
        }
      }
    ]
  },
  "cluster_name" : "elastic-com-c1",
  "cluster_uuid" : "pk3BirT2SB-Z5MAPU5vC8A",
  "timestamp" : 1679331894949,
  "status" : "yellow",
  "indices" : {
    "count" : 2068,
    "shards" : {
      "total" : 16004,
      "primaries" : 7878,
      "replication" : 1.0314800710840315,
      "index" : {
        "shards" : {
          "min" : 1,
          "max" : 16,
          "avg" : 7.738878143133462
        },
        "primaries" : {
          "min" : 0,
          "max" : 8,
          "avg" : 3.8094777562862667
        },
        "replication" : {
          "min" : 0.0,
          "max" : 5.0,
          "avg" : 0.8762532237266313
        }
      }
    },
    "docs" : {
      "count" : 14729249991,
      "deleted" : 399380
    },
    "store" : {
      "size_in_bytes" : 24956558350747,
      "reserved_in_bytes" : 0
    },
    "fielddata" : {
      "memory_size_in_bytes" : 2013896,
      "evictions" : 0
    },
    "query_cache" : {
      "memory_size_in_bytes" : 188214602,
      "total_count" : 322519208,
      "hit_count" : 6450503,
      "miss_count" : 316068705,
      "cache_size" : 220988,
      "cache_count" : 486371,
      "evictions" : 265383
    },
    "completion" : {
      "size_in_bytes" : 0
    },
    "segments" : {
      "count" : 280470,
      "memory_in_bytes" : 11468541452,
      "terms_memory_in_bytes" : 11203792224,
      "stored_fields_memory_in_bytes" : 150720048,
      "term_vectors_memory_in_bytes" : 0,
      "norms_memory_in_bytes" : 71232,
      "points_memory_in_bytes" : 0,
      "doc_values_memory_in_bytes" : 113957948,
      "index_writer_memory_in_bytes" : 1069367056,
      "version_map_memory_in_bytes" : 2465,
      "fixed_bit_set_memory_in_bytes" : 45496,
      "max_unsafe_auto_id_timestamp" : 1679331682680,
      "file_sizes" : { }
    },
    "mappings" : {
      "field_types" : [
        {
          "name" : "alias",
          "count" : 1,
          "index_count" : 1
        },
        {
          "name" : "boolean",
          "count" : 934,
          "index_count" : 934
        },
        {
          "name" : "date",
          "count" : 2126,
          "index_count" : 2088
        },
        {
          "name" : "double",
          "count" : 1,
          "index_count" : 1
        },
        {
          "name" : "geo_point",
          "count" : 2074,
          "index_count" : 2074
        },
        {
          "name" : "half_float",
          "count" : 4146,
          "index_count" : 2073
        },
        {
          "name" : "ip",
          "count" : 2075,
          "index_count" : 2074
        },
        {
          "name" : "keyword",
          "count" : 140489,
          "index_count" : 2088
        },
        {
          "name" : "long",
          "count" : 1675,
          "index_count" : 1673
        },
        {
          "name" : "nested",
          "count" : 12,
          "index_count" : 12
        },
        {
          "name" : "object",
          "count" : 27974,
          "index_count" : 2088
        },
        {
          "name" : "text",
          "count" : 139937,
          "index_count" : 1673
        }
      ]
    },
    "analysis" : {
      "char_filter_types" : [ ],
      "tokenizer_types" : [ ],
      "filter_types" : [ ],
      "analyzer_types" : [ ],
      "built_in_char_filters" : [ ],
      "built_in_tokenizers" : [ ],
      "built_in_filters" : [ ],
      "built_in_analyzers" : [ ]
    },
    "versions" : [
      {
        "version" : "7.6.2",
        "index_count" : 5,
        "primary_shard_count" : 5,
        "total_primary_bytes" : 511534
      },
      {
        "version" : "7.7.1",
        "index_count" : 5,
        "primary_shard_count" : 5,
        "total_primary_bytes" : 2344624
      },
      {
        "version" : "7.8.0",
        "index_count" : 29,
        "primary_shard_count" : 29,
        "total_primary_bytes" : 107128695
      },
      {
        "version" : "7.12.0",
        "index_count" : 2121,
        "primary_shard_count" : 10768,
        "total_primary_bytes" : 12106961132047
      }
    ]
  },
  "nodes" : {
    "count" : {
      "total" : 25,
      "coordinating_only" : 0,
      "data" : 22,
      "data_cold" : 22,
      "data_content" : 22,
      "data_frozen" : 22,
      "data_hot" : 22,
      "data_warm" : 22,
      "ingest" : 6,
      "master" : 3,
      "ml" : 0,
      "remote_cluster_client" : 25,
      "transform" : 22,
      "voting_only" : 0
    },
    "versions" : [
      "7.12.0"
    ],
    "os" : {
      "available_processors" : 204,
      "allocated_processors" : 204,
      "names" : [
        {
          "name" : "Linux",
          "count" : 25
        }
      ],
      "pretty_names" : [
        {
          "pretty_name" : "Debian GNU/Linux 10 (buster)",
          "count" : 25
        }
      ],
      "architectures" : [
        {
          "arch" : "amd64",
          "count" : 25
        }
      ],
      "mem" : {
        "total_in_bytes" : 486110662656,
        "free_in_bytes" : 27505487872,
        "used_in_bytes" : 458605174784,
        "free_percent" : 6,
        "used_percent" : 94
      }
    },
    "process" : {
      "cpu" : {
        "percent" : 156
      },
      "open_file_descriptors" : {
        "min" : 1037,
        "max" : 12189,
        "avg" : 8705
      }
    },
    "jvm" : {
      "max_uptime_in_millis" : 30797566735,
      "versions" : [
        {
          "version" : "15.0.1",
          "vm_name" : "OpenJDK 64-Bit Server VM",
          "vm_version" : "15.0.1+9",
          "vm_vendor" : "AdoptOpenJDK",
          "bundled_jdk" : true,
          "using_bundled_jdk" : true,
          "count" : 25
        }
      ],
      "mem" : {
        "heap_used_in_bytes" : 172233827976,
        "heap_max_in_bytes" : 253403070464
      },
      "threads" : 3952
    },
    "fs" : {
      "total_in_bytes" : 38483938881536,
      "free_in_bytes" : 12847313604608,
      "available_in_bytes" : 11278792450048
    },
    "plugins" : [ ],
    "network_types" : {
      "transport_types" : {
        "security4" : 25
      },
      "http_types" : {
        "security4" : 25
      }
    },
    "discovery_types" : {
      "zen" : 25
    },
    "packaging_types" : [
      {
        "flavor" : "default",
        "type" : "deb",
        "count" : 25
      }
    ],
    "ingest" : {
      "number_of_pipelines" : 2,
      "processor_stats" : {
        "gsub" : {
          "count" : 0,
          "failed" : 0,
          "current" : 0,
          "time_in_millis" : 0
        },
        "script" : {
          "count" : 0,
          "failed" : 0,
          "current" : 0,
          "time_in_millis" : 0
        }
      }
    }
  }
}

Christian_Dahlqvist · March 20, 2023, 5:18pm

It looks like a number of nodes may have an issue with heap space as data collection from these failed due to circuit breaker error. It looks like you have around 10GB heap assigned and a very large number of reasonably small shards (around 1.5GB average size if I calculated correctly).

If nodes suffer from long GC (is there anything in the logs?) data may be relocated due to this. Given the large number of shards this can take a while, depending on the load the cluster is under and the amount of resources available, especially disk I/O.

Do you have any entries in the Elasticsearch logs around nodes leaving and rejoining the cluster, e.g. due to long GC or other issues?

DavidTurner · March 20, 2023, 5:58pm

Also ...

... this version is just coming up to 2 years old, and well past EOL, you should upgrade to a supported version as a matter of urgency.

system · April 17, 2023, 5:58pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
One shard continually fails to allocate Elasticsearch	9	497	July 6, 2017
ALLOCATION_FAILED needed explain Elastic 8.1.1 Elasticsearch docker	3	664	July 19, 2022
UNASSIGNED ALLOCATION_FAILED Elasticsearch	2	695	February 10, 2023
How to handle system failures in Elasticsearch cluster Elasticsearch	5	405	July 6, 2017
Disappearing Data and Unassigned Shards Elasticsearch	5	852	July 6, 2017

Allocation Failed

Related topics