I have a cluster running with 10 data, 3 master and 3 client nodes. I am facing issue on one of data node where no shard is able to get assign on it.
_cluster/allocation/explain is giving below message:
{
"node_id" : "TULFVEcrTxCfINXVazprLA",
"node_name" : "data-1",
"transport_address" : "192.168.16.126:9300",
"node_decision" : "throttled",
"weight_ranking" : 1,
"deciders" : [
{
"decider" : "throttling",
"decision" : "THROTTLE",
"explanation" : "reached the limit of ongoing initial primary recoveries [4], cluster setting [cluster.routing.allocation.node_initial_primaries_recoveries=4]"
}
]
}
I see below log message frequently in data-1 logs:
{"type":"log","host":"Elasticsearch-data-1","level":"WARN","time": "2021-09-18T15:03:32.407Z","logger":"o.e.d.z.ZenDiscovery","timezone":"UTC","marker":"[Elasticsearch-data-1] ","log":"dropping pending state [[uuid[p2MPdq59QCKtduezFv9Y2A], v[175229293], m[NAlqPclnQY-25G9p6_4mBA]]]. more than [25] pending states."}
Also, even though shards are getting assigned on other data nodes, document count for them is still zero and I did not find any relevant message in the logs which could imply this behaviour. There are continuous garbage collector logs which can be seen:
{"type":"lob94d-ckhfw","level":"WARN","time": "2021-12-17T03:51:10.869Z","logger":"o.e.m.j.JvmGcMonitorService","timezone":"UTC","marker":"[elasticsearch-master-55ff74b94d-ckhfw] ","log":"[gc][36006457] overhead, spent [754ms] collecting in the last [1.2s]"}
{"type":"log","host":"elasticsearch-master-55ff74b94d-ckhfw","level":"WARN","time": "2021-12-17T04:09:04.407Z","logger":"o.e.m.j.JvmGcMonitorService","timezone":"UTC","marker":"[elasticsearch-master-55ff74b94d-ckhfw] ","log":"[gc][young][36007530][693943] duration [1s], collections [1]/[1.2s], total [1s]/[13.4h], memory [8gb]->[7.8gb]/[15.9gb], all_pools {[young] [264.4mb]->[4.1mb]/[266.2mb]}{[survivor] [5.1mb]->[3.7mb]/[33.2mb]}{[old] [7.8gb]->[7.8gb]/[15.6gb]}"}
Note: JVM Heap is set to 32GB, since I have set it to max allowed value Can this cause such issue?
Please look into this and check about what could be the reason and how to overcome this.