Cluster stuck in Initializing

lwintergerst · January 14, 2016, 8:45am

Hi,

EDIT: running ES 1.7
EDIT2: I launched a new node with the same settings as the nodes not receiving any data. The new node also gets no shards. The cluster is still stuck. One node is producing the outofMemory Errors. I'm not sure what will happen if I restart this node..

we run a large logging cluster with a few billion documents.
This night our cluster had some issues. I'm still looking for the reason.
But that's not my main concern at the moment.

The cluster had a few yellow shards left this morning, after doing its best to recover.
I tried everything to allocate them.
-Turning off/on allocation on shard and cluster level
-replica -> 0 replica -> 1

restart node
force allocation with a script
But nothing seems to work.

After restarting one node with stuck shards. the unassigned count went from 35 to 485(all on this node). Now no replica shard can be assigned.
Disk usage is fine.

{ "cluster_name" : "prod", "status" : "yellow", "timed_out" : false, "number_of_nodes" : 12, "number_of_data_nodes" : 6, "active_primary_shards" : 4947, "active_shards" : 9380, "relocating_shards" : 4, "initializing_shards" : 38, "unassigned_shards" : 447, "delayed_unassigned_shards" : 0, "number_of_pending_tasks" : 0, "number_of_in_flight_fetch" : 0 }

The 38 in initializing is from me forcing them. They are stuck now.

I'm also seeing this exception we've never had before:
Failed to send error message back to client for action [internal:index/shard/recovery/start_recovery] java.lang.OutOfMemoryError: Java heap space

there are no logs regarding long GC times. The cluster is responsive and still working perfectly fine.

And also this one

Actual Exception org.elasticsearch.indices.recovery.DelayRecoveryException: source node does not have the shard listed in its state as allocated on the node at org.elasticsearch.indices.recovery.RecoverySource.recover(RecoverySource.java:108) at org.elasticsearch.indices.recovery.RecoverySource.access$200(RecoverySource.java:49) at org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(RecoverySource.java:146) at org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(RecoverySource.java:132) at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.doRun(MessageChannelHandler.java:279) at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:36) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)

It would be ok to restart the cluster, IF I know that it wont get stuck for all shards. (because afte restarting one node all shards of this node are stuck)

Help would be appreciated.

Topic		Replies	Views
Replica shards stuck in Initialization phase Elasticsearch	5	2876	July 5, 2017
Shards stuck on initialising Elasticsearch	8	1264	July 5, 2017
One node cluster stuck on initializing_shards Elasticsearch	4	7505	June 29, 2017
Shards stuck in unassigned state Elasticsearch	3	540	March 11, 2019
Shards got stuck in INITIALIZING state Elasticsearch	2	919	July 5, 2017

Cluster stuck in Initializing

Related topics