ES Cluster in Yellow with Unassigned Shards flopping in and out of Unassigned

im having a problem with my ES cluster. When i check cluster status im seeing Yellow... I see 1 min unassigned Shards = 4 and next min its 0

curl -XGET 'http://elasticsearch:9200/_cluster/health?pretty'
{
"cluster_name" : "ES_cluster",
"status" : "yellow",
"timed_out" : false,
"number_of_nodes" : 8,
"number_of_data_nodes" : 5,
"active_primary_shards" : 106,
"active_shards" : 208,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 4,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 6,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 31,
"active_shards_percent_as_number" : 98.11320754716981
}

curl -XGET 'http://elasticsearch:9200/_cluster/health?pretty'
{
"cluster_name" : "ES_cluster",
"status" : "yellow",
"timed_out" : false,
"number_of_nodes" : 8,
"number_of_data_nodes" : 5,
"active_primary_shards" : 106,
"active_shards" : 208,
"relocating_shards" : 0,
"initializing_shards" : 4,
"unassigned_shards" : 0,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 5,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 118,
"active_shards_percent_as_number" : 98.11320754716981
}

Its flopping back and forth between initializing shards and unassigned shards. Earlier in the day another engineer deleted some large old indexes, which might have caused this problem.

What do the logs on the master node show?

Unlikely.

Lots of Warnings about not being able to receive shard due to index that other engineer delete being missing.
[2016-12-21 23:45:02,100][WARN ][cluster.action.shard ] [elasticsearch-master0.node.netprod.vci] [logstash-2016.12.02][2] received shard failed for [logstash-2016.12.02][2], node[u7ReG8ycQ1udVr8j-5-nyA], [R], v[102954], s[INITIALIZING], a[id=bLbgpONHQKe5YDyGa3Qmmw], unassigned_info[[reason=ALLOCATION_FAILED], at[2016-12-21T23:45:01.980Z], details[failed recovery, failure RecoveryFailedException[[logstash-2016.12.02][2]: Recovery failed from {elasticsearch-data1.node.netprod.vci}{6EfoOppSQ_q-ANHPNP2H0g}{192.99.99.59}{192.99.99.59:9300}{master=false} into {elasticsearch-data0.node.netprod.vci}{mVot4nbASjqU4Ij1eGAhwA}{192.99.99.44}{192.99.99.44:9300}{master=false}]; nested: RemoteTransportException[[elasticsearch-data1.node.netprod.vci][192.99.99.59:9300][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[Phase[1] phase1 failed]; nested: RecoverFilesRecoveryException[Failed to transfer [0] files with total size of [0b]]; nested: NoSuchFileException[/var/log/elasticsearch/netprod/nodes/0/indices/logstash-2016.12.02/2/index]; ]], indexUUID [OVg7952VR8uenasSv-16iw], message [failed recovery], failure [RecoveryFailedException[[logstash-2016.12.02][2]: Recovery failed from {elasticsearch-data1.node.netprod.vci}{6EfoOppSQ_q-ANHPNP2H0g}{192.99.99.59}{192.99.99.59:9300}{master=false} into {elasticsearch-data4.node.netprod.vci}{u7ReG8ycQ1udVr8j-5-nyA}{192.99.99.50}{192.99.99.50:9300}{master=false}]; nested: RemoteTransportException[[elasticsearch-data1.node.netprod.vci][192.99.99.59:9300][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[Phase[1] phase1 failed]; nested: RecoverFilesRecoveryException[Failed to transfer [0] files with total size of [0b]]; nested: NoSuchFileException[/var/log/elasticsearch/netprod/nodes/0/indices/logstash-2016.12.02/2/index]; ]
RecoveryFailedException[[logstash-2016.12.02][2]: Recovery failed from {elasticsearch-data1.node.netprod.vci}{6EfoOppSQ_q-ANHPNP2H0g}{192.99.99.59}{192.99.99.59:9300}{master=false} into {elasticsearch-data4.node.netprod.vci}{u7ReG8ycQ1udVr8j-5-nyA}{192.99.99.50}{192.99.99.50:9300}{master=false}]; nested: RemoteTransportException[[elasticsearch-data1.node.netprod.vci][192.99.99.59:9300][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[Phase[1] phase1 failed]; nested: RecoverFilesRecoveryException[Failed to transfer [0] files with total size of [0b]]; nested: NoSuchFileException[/var/log/elasticsearch/netprod/nodes/0/indices/logstash-2016.12.02/2/index];
at org.elasticsearch.indices.recovery.RecoveryTarget.doRecovery(RecoveryTarget.java:258)
at org.elasticsearch.indices.recovery.RecoveryTarget.access$1100(RecoveryTarget.java:69)
at org.elasticsearch.indices.recovery.RecoveryTarget$RecoveryRunner.doRun(RecoveryTarget.java:508)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: RemoteTransportException[[elasticsearch-data1.node.netprod.vci][192.99.99.59:9300][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[Phase[1] phase1 failed]; nested: RecoverFilesRecoveryException[Failed to transfer [0] files with total size of [0b]]; nested: NoSuchFileException[/var/log/elasticsearch/netprod/nodes/0/indices/logstash-2016.12.02/2/index];
Caused by: [logstash-2016.12.02][[logstash-2016.12.02][2]] RecoveryEngineException[Phase[1] phase1 failed]; nested: RecoverFilesRecoveryException[Failed to transfer [0] files with total size of [0b]]; nested: NoSuchFileException[/var/log/elasticsearch/netprod/nodes/0/indices/logstash-2016.12.02/2/index];
at org.elasticsearch.indices.recovery.RecoverySourceHandler.recoverToTarget(RecoverySourceHandler.java:135)
at org.elasticsearch.indices.recovery.RecoverySourceHandler.recoverToTarget(RecoverySourceHandler.java:135)
at org.elasticsearch.indices.recovery.RecoverySource.recover(RecoverySource.java:126)
at org.elasticsearch.indices.recovery.RecoverySource.access$200(RecoverySource.java:52)
at org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(RecoverySource.java:135)
at org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(RecoverySource.java:132)
at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.doRun(MessageChannelHandler.java:299)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: [logstash-2016.12.02][[logstash-2016.12.02][2]] RecoverFilesRecoveryException[Failed to transfer [0] files with total size of [0b]]; nested: NoSuchFileException[/var/log/elasticsearch/netprod/nodes/0/indices/logstash-2016.12.02/2/index];

How were the indices deleted?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.