ES Cluster in Yellow with Unassigned Shards flopping in and out of Unassigned

jquinnett · December 22, 2016, 12:57am

im having a problem with my ES cluster. When i check cluster status im seeing Yellow... I see 1 min unassigned Shards = 4 and next min its 0

curl -XGET 'http://elasticsearch:9200/_cluster/health?pretty'
{
"cluster_name" : "ES_cluster",
"status" : "yellow",
"timed_out" : false,
"number_of_nodes" : 8,
"number_of_data_nodes" : 5,
"active_primary_shards" : 106,
"active_shards" : 208,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 4,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 6,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 31,
"active_shards_percent_as_number" : 98.11320754716981
}

curl -XGET 'http://elasticsearch:9200/_cluster/health?pretty'
{
"cluster_name" : "ES_cluster",
"status" : "yellow",
"timed_out" : false,
"number_of_nodes" : 8,
"number_of_data_nodes" : 5,
"active_primary_shards" : 106,
"active_shards" : 208,
"relocating_shards" : 0,
"initializing_shards" : 4,
"unassigned_shards" : 0,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 5,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 118,
"active_shards_percent_as_number" : 98.11320754716981
}

Its flopping back and forth between initializing shards and unassigned shards. Earlier in the day another engineer deleted some large old indexes, which might have caused this problem.

warkolm · December 22, 2016, 2:24am

What do the logs on the master node show?

Unlikely.

jquinnett · December 22, 2016, 4:04pm

Lots of Warnings about not being able to receive shard due to index that other engineer delete being missing.
[2016-12-21 23:45:02,100][WARN ][cluster.action.shard ] [elasticsearch-master0.node.netprod.vci] [logstash-2016.12.02][2] received shard failed for [logstash-2016.12.02][2], node[u7ReG8ycQ1udVr8j-5-nyA], [R], v[102954], s[INITIALIZING], a[id=bLbgpONHQKe5YDyGa3Qmmw], unassigned_info[[reason=ALLOCATION_FAILED], at[2016-12-21T23:45:01.980Z], details[failed recovery, failure RecoveryFailedException[[logstash-2016.12.02][2]: Recovery failed from {elasticsearch-data1.node.netprod.vci}{6EfoOppSQ_q-ANHPNP2H0g}{192.99.99.59}{192.99.99.59:9300}{master=false} into {elasticsearch-data0.node.netprod.vci}{mVot4nbASjqU4Ij1eGAhwA}{192.99.99.44}{192.99.99.44:9300}{master=false}]; nested: RemoteTransportException[[elasticsearch-data1.node.netprod.vci][192.99.99.59:9300][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[Phase[1] phase1 failed]; nested: RecoverFilesRecoveryException[Failed to transfer [0] files with total size of [0b]]; nested: NoSuchFileException[/var/log/elasticsearch/netprod/nodes/0/indices/logstash-2016.12.02/2/index]; ]], indexUUID [OVg7952VR8uenasSv-16iw], message [failed recovery], failure [RecoveryFailedException[[logstash-2016.12.02][2]: Recovery failed from {elasticsearch-data1.node.netprod.vci}{6EfoOppSQ_q-ANHPNP2H0g}{192.99.99.59}{192.99.99.59:9300}{master=false} into {elasticsearch-data4.node.netprod.vci}{u7ReG8ycQ1udVr8j-5-nyA}{192.99.99.50}{192.99.99.50:9300}{master=false}]; nested: RemoteTransportException[[elasticsearch-data1.node.netprod.vci][192.99.99.59:9300][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[Phase[1] phase1 failed]; nested: RecoverFilesRecoveryException[Failed to transfer [0] files with total size of [0b]]; nested: NoSuchFileException[/var/log/elasticsearch/netprod/nodes/0/indices/logstash-2016.12.02/2/index]; ]
RecoveryFailedException[[logstash-2016.12.02][2]: Recovery failed from {elasticsearch-data1.node.netprod.vci}{6EfoOppSQ_q-ANHPNP2H0g}{192.99.99.59}{192.99.99.59:9300}{master=false} into {elasticsearch-data4.node.netprod.vci}{u7ReG8ycQ1udVr8j-5-nyA}{192.99.99.50}{192.99.99.50:9300}{master=false}]; nested: RemoteTransportException[[elasticsearch-data1.node.netprod.vci][192.99.99.59:9300][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[Phase[1] phase1 failed]; nested: RecoverFilesRecoveryException[Failed to transfer [0] files with total size of [0b]]; nested: NoSuchFileException[/var/log/elasticsearch/netprod/nodes/0/indices/logstash-2016.12.02/2/index];
at org.elasticsearch.indices.recovery.RecoveryTarget.doRecovery(RecoveryTarget.java:258)
at org.elasticsearch.indices.recovery.RecoveryTarget.access$1100(RecoveryTarget.java:69)
at org.elasticsearch.indices.recovery.RecoveryTarget$RecoveryRunner.doRun(RecoveryTarget.java:508)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: RemoteTransportException[[elasticsearch-data1.node.netprod.vci][192.99.99.59:9300][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[Phase[1] phase1 failed]; nested: RecoverFilesRecoveryException[Failed to transfer [0] files with total size of [0b]]; nested: NoSuchFileException[/var/log/elasticsearch/netprod/nodes/0/indices/logstash-2016.12.02/2/index];
Caused by: [logstash-2016.12.02][[logstash-2016.12.02][2]] RecoveryEngineException[Phase[1] phase1 failed]; nested: RecoverFilesRecoveryException[Failed to transfer [0] files with total size of [0b]]; nested: NoSuchFileException[/var/log/elasticsearch/netprod/nodes/0/indices/logstash-2016.12.02/2/index];
at org.elasticsearch.indices.recovery.RecoverySourceHandler.recoverToTarget(RecoverySourceHandler.java:135)
at org.elasticsearch.indices.recovery.RecoverySourceHandler.recoverToTarget(RecoverySourceHandler.java:135)
at org.elasticsearch.indices.recovery.RecoverySource.recover(RecoverySource.java:126)
at org.elasticsearch.indices.recovery.RecoverySource.access$200(RecoverySource.java:52)
at org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(RecoverySource.java:135)
at org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(RecoverySource.java:132)
at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.doRun(MessageChannelHandler.java:299)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: [logstash-2016.12.02][[logstash-2016.12.02][2]] RecoverFilesRecoveryException[Failed to transfer [0] files with total size of [0b]]; nested: NoSuchFileException[/var/log/elasticsearch/netprod/nodes/0/indices/logstash-2016.12.02/2/index];

warkolm · December 22, 2016, 9:15pm

How were the indices deleted?

system · January 19, 2017, 9:15pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elasticsearch continuously on yellow Status - Unassigned Shards Elasticsearch	2	221	April 12, 2023
Yellow heath, Unassigned shards after restart Elasticsearch	7	979	October 22, 2020
Elasticsearch cluster in Yellow state and 1 Unassigned Shard Elasticsearch	27	1981	October 2, 2020
Cluster Status is Yellow Elasticsearch	2	394	June 7, 2018
ES 6.5 unassigned_shards":1000 do not get assigned - status yellow Elasticsearch	7	587	July 24, 2019

ES Cluster in Yellow with Unassigned Shards flopping in and out of Unassigned

Related topics