Unassigned Replica Shards

James_Stallings · November 1, 2022, 7:57pm

How often does the control loop run that checks to attempt to re-assign an unassigned replica shard? Is there documentation to this effect? Is there source code I can look at?

warkolm · November 1, 2022, 8:38pm

Welcome to our community!

What problem are you seeing?

James_Stallings · November 1, 2022, 8:55pm

We had a cluster running in yellow status for a few weeks. I understand why, but I can't find documentation of how often the cluster would try to repair this index. Below it appears to be a 3-week gap? Is that accurate?

[2022-10-07T08:16:41,167][WARN ][o.e.a.b.TransportShardBulkAction] [hx3gz6o] [[xxxx][0]] failed to perform __PATH__[s] on replica [xxx][0], node[xxx], [R], s[STARTED], a[id=xxx]
RemoteTransportException[[xxx][__IP__][__PATH__[s][r]]]; nested: CircuitBreakingException[[parent] Data too large, data for [<transport_request>] would be [__PATH__], which is larger than the limit of [__PATH__], real usage: [__PATH__], new bytes reserved: [__PATH__]];
Caused by: CircuitBreakingException[[parent] Data too large, data for [<transport_request>] would be [__PATH__], which is larger than the limit of [__PATH__], real usage: [__PATH__], new bytes reserved: [__PATH__]]
[2022-10-07T08:16:41,187][INFO ][o.e.c.r.a.AllocationService] [DEY_Qhr] Cluster health status changed from [GREEN] to [YELLOW] (reason: [shards failed [[xxx][0]] ...]).
[2022-10-28T06:18:29,136][WARN ][o.e.a.b.TransportShardBulkAction] [hx3gz6o] [[xxxx][0]] failed to perform __PATH__[s] on replica [xxx][0], node[xxx], [R], s[STARTED], a[id=xxx]
RemoteTransportException[[DEY_Qhr][__IP__][__PATH__[s][r]]]; nested: CircuitBreakingException[[parent] Data too large, data for [<transport_request>] would be [__PATH__], which is larger than the limit of [__PATH__], real usage: [__PATH__], new bytes reserved: [__PATH__]];
Caused by: CircuitBreakingException[[parent] Data too large, data for [<transport_request>] would be [__PATH__], which is larger than the limit of [__PATH__], real usage: [__PATH__], new bytes reserved: [__PATH__]]

warkolm · November 1, 2022, 8:56pm

It should be more-or-less consistently checking.

What is the output from the _cluster/stats?pretty&human API? It might give us some further insight.

James_Stallings · November 1, 2022, 9:05pm

{
  "_nodes" : {
    "total" : 3,
    "successful" : 3,
    "failed" : 0
  },
  "cluster_name" : "xxxx",
  "cluster_uuid" : "2mpozck8RqKcMivuvxTdEg",
  "timestamp" : 1667336617127,
  "status" : "yellow",
  "indices" : {
    "count" : 7,
    "shards" : {
      "total" : 29,
      "primaries" : 15,
      "replication" : 0.9333333333333333,
      "index" : {
        "shards" : {
          "min" : 1,
          "max" : 10,
          "avg" : 4.142857142857143
        },
        "primaries" : {
          "min" : 1,
          "max" : 5,
          "avg" : 2.142857142857143
        },
        "replication" : {
          "min" : 0.0,
          "max" : 1.0,
          "avg" : 0.8571428571428571
        }
      }
    },
    "docs" : {
      "count" : 238719302,
      "deleted" : 91184055
    },
    "store" : {
      "size" : "1.2tb",
      "size_in_bytes" : 1335819766994
    },
    "fielddata" : {
      "memory_size" : "0b",
      "memory_size_in_bytes" : 0,
      "evictions" : 0
    },
    "query_cache" : {
      "memory_size" : "86.2mb",
      "memory_size_in_bytes" : 90392008,
      "total_count" : 1269546,
      "hit_count" : 640867,
      "miss_count" : 628679,
      "cache_size" : 11724,
      "cache_count" : 11724,
      "evictions" : 0
    },
    "completion" : {
      "size" : "0b",
      "size_in_bytes" : 0
    },
    "segments" : {
      "count" : 451,
      "memory" : "87mb",
      "memory_in_bytes" : 91330236,
      "terms_memory" : "4mb",
      "terms_memory_in_bytes" : 4231872,
      "stored_fields_memory" : "45.5mb",
      "stored_fields_memory_in_bytes" : 47808096,
      "term_vectors_memory" : "36.1mb",
      "term_vectors_memory_in_bytes" : 37947600,
      "norms_memory" : "377.1kb",
      "norms_memory_in_bytes" : 386240,
      "points_memory" : "0b",
      "points_memory_in_bytes" : 0,
      "doc_values_memory" : "934kb",
      "doc_values_memory_in_bytes" : 956428,
      "index_writer_memory" : "4.8mb",
      "index_writer_memory_in_bytes" : 5091256,
      "version_map_memory" : "644b",
      "version_map_memory_in_bytes" : 644,
      "fixed_bit_set" : "71.2mb",
      "fixed_bit_set_memory_in_bytes" : 74750944,
      "max_unsafe_auto_id_timestamp" : -1,
      "file_sizes" : { }
    }
  },
  "nodes" : {
    "count" : {
      "total" : 3,
      "data" : 3,
      "coordinating_only" : 0,
      "master" : 3,
      "ingest" : 3
    },
    "versions" : [ "6.8.0" ],
    "os" : {
      "available_processors" : 12,
      "allocated_processors" : 12,
      "names" : [ {
        "count" : 3
      } ],
      "pretty_names" : [ {
        "count" : 3
      } ],
      "mem" : {
        "total" : "45.8gb",
        "total_in_bytes" : 49262174208,
        "free" : "4.4gb",
        "free_in_bytes" : 4803670016,
        "used" : "41.4gb",
        "used_in_bytes" : 44458504192,
        "free_percent" : 10,
        "used_percent" : 90
      }
    },
    "process" : {
      "cpu" : {
        "percent" : 21
      },
      "open_file_descriptors" : {
        "min" : 1584,
        "max" : 1717,
        "avg" : 1628
      }
    },
    "jvm" : {
      "max_uptime" : "54.8d",
      "max_uptime_in_millis" : 4738694465,
      "mem" : {
        "heap_used" : "11.4gb",
        "heap_used_in_bytes" : 12293693848,
        "heap_max" : "23.9gb",
        "heap_max_in_bytes" : 25665208320
      },
      "threads" : 494
    },
    "fs" : {
      "total" : "2.9tb",
      "total_in_bytes" : 3246361178112,
      "free" : "1.1tb",
      "free_in_bytes" : 1273565753344,
      "available" : "1tb",
      "available_in_bytes" : 1108588687360
    },
    "network_types" : {
      "transport_types" : {
        "netty4" : 3
      },
      "http_types" : {
        "filter-jetty" : 3
      }
    }
  }
}

James_Stallings · November 2, 2022, 1:05pm

@warkolm any thoughts?

warkolm · November 2, 2022, 9:56pm

Do you have logs from your master node you could share? The more the better.

Please upgrade, this version is EOL.

James_Stallings · November 4, 2022, 12:47pm

We are working on an upgrade soon. I ended up rebuilding the cluster since we needed to get out of the yellow zone. Thanks for the help.

system · December 2, 2022, 12:48pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
CircuitBreakingException Data too large Elasticsearch	4	587	August 14, 2020
Help with unassigned shards / CircuitBreakingException / Values less than -1 bytes are not supported Elasticsearch	32	1918	January 26, 2021
Assigning failing SHARD's Elasticsearch	1	213	January 16, 2023
Why move shard to unassigned when the circuit breaker is open? Elasticsearch	7	1096	January 16, 2020
CircuitBreakingException internal:index/shard/recovery/start_recovery Elasticsearch	1	574	June 25, 2018

Unassigned Replica Shards

Related Topics