ES Cluster in Yellow with Unassigned Shards flopping in and out of Unassigned


(Joseph Quinnett) #1

im having a problem with my ES cluster. When i check cluster status im seeing Yellow... I see 1 min unassigned Shards = 4 and next min its 0

curl -XGET 'http://elasticsearch:9200/_cluster/health?pretty'
{
"cluster_name" : "ES_cluster",
"status" : "yellow",
"timed_out" : false,
"number_of_nodes" : 8,
"number_of_data_nodes" : 5,
"active_primary_shards" : 106,
"active_shards" : 208,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 4,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 6,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 31,
"active_shards_percent_as_number" : 98.11320754716981
}

curl -XGET 'http://elasticsearch:9200/_cluster/health?pretty'
{
"cluster_name" : "ES_cluster",
"status" : "yellow",
"timed_out" : false,
"number_of_nodes" : 8,
"number_of_data_nodes" : 5,
"active_primary_shards" : 106,
"active_shards" : 208,
"relocating_shards" : 0,
"initializing_shards" : 4,
"unassigned_shards" : 0,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 5,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 118,
"active_shards_percent_as_number" : 98.11320754716981
}

Its flopping back and forth between initializing shards and unassigned shards. Earlier in the day another engineer deleted some large old indexes, which might have caused this problem.


(Mark Walkom) #2

What do the logs on the master node show?

Unlikely.


(Joseph Quinnett) #3

Lots of Warnings about not being able to receive shard due to index that other engineer delete being missing.
[2016-12-21 23:45:02,100][WARN ][cluster.action.shard ] [elasticsearch-master0.node.netprod.vci] [logstash-2016.12.02][2] received shard failed for [logstash-2016.12.02][2], node[u7ReG8ycQ1udVr8j-5-nyA], [R], v[102954], s[INITIALIZING], a[id=bLbgpONHQKe5YDyGa3Qmmw], unassigned_info[[reason=ALLOCATION_FAILED], at[2016-12-21T23:45:01.980Z], details[failed recovery, failure RecoveryFailedException[[logstash-2016.12.02][2]: Recovery failed from {elasticsearch-data1.node.netprod.vci}{6EfoOppSQ_q-ANHPNP2H0g}{192.99.99.59}{192.99.99.59:9300}{master=false} into {elasticsearch-data0.node.netprod.vci}{mVot4nbASjqU4Ij1eGAhwA}{192.99.99.44}{192.99.99.44:9300}{master=false}]; nested: RemoteTransportException[[elasticsearch-data1.node.netprod.vci][192.99.99.59:9300][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[Phase[1] phase1 failed]; nested: RecoverFilesRecoveryException[Failed to transfer [0] files with total size of [0b]]; nested: NoSuchFileException[/var/log/elasticsearch/netprod/nodes/0/indices/logstash-2016.12.02/2/index]; ]], indexUUID [OVg7952VR8uenasSv-16iw], message [failed recovery], failure [RecoveryFailedException[[logstash-2016.12.02][2]: Recovery failed from {elasticsearch-data1.node.netprod.vci}{6EfoOppSQ_q-ANHPNP2H0g}{192.99.99.59}{192.99.99.59:9300}{master=false} into {elasticsearch-data4.node.netprod.vci}{u7ReG8ycQ1udVr8j-5-nyA}{192.99.99.50}{192.99.99.50:9300}{master=false}]; nested: RemoteTransportException[[elasticsearch-data1.node.netprod.vci][192.99.99.59:9300][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[Phase[1] phase1 failed]; nested: RecoverFilesRecoveryException[Failed to transfer [0] files with total size of [0b]]; nested: NoSuchFileException[/var/log/elasticsearch/netprod/nodes/0/indices/logstash-2016.12.02/2/index]; ]
RecoveryFailedException[[logstash-2016.12.02][2]: Recovery failed from {elasticsearch-data1.node.netprod.vci}{6EfoOppSQ_q-ANHPNP2H0g}{192.99.99.59}{192.99.99.59:9300}{master=false} into {elasticsearch-data4.node.netprod.vci}{u7ReG8ycQ1udVr8j-5-nyA}{192.99.99.50}{192.99.99.50:9300}{master=false}]; nested: RemoteTransportException[[elasticsearch-data1.node.netprod.vci][192.99.99.59:9300][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[Phase[1] phase1 failed]; nested: RecoverFilesRecoveryException[Failed to transfer [0] files with total size of [0b]]; nested: NoSuchFileException[/var/log/elasticsearch/netprod/nodes/0/indices/logstash-2016.12.02/2/index];
at org.elasticsearch.indices.recovery.RecoveryTarget.doRecovery(RecoveryTarget.java:258)
at org.elasticsearch.indices.recovery.RecoveryTarget.access$1100(RecoveryTarget.java:69)
at org.elasticsearch.indices.recovery.RecoveryTarget$RecoveryRunner.doRun(RecoveryTarget.java:508)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: RemoteTransportException[[elasticsearch-data1.node.netprod.vci][192.99.99.59:9300][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[Phase[1] phase1 failed]; nested: RecoverFilesRecoveryException[Failed to transfer [0] files with total size of [0b]]; nested: NoSuchFileException[/var/log/elasticsearch/netprod/nodes/0/indices/logstash-2016.12.02/2/index];
Caused by: [logstash-2016.12.02][[logstash-2016.12.02][2]] RecoveryEngineException[Phase[1] phase1 failed]; nested: RecoverFilesRecoveryException[Failed to transfer [0] files with total size of [0b]]; nested: NoSuchFileException[/var/log/elasticsearch/netprod/nodes/0/indices/logstash-2016.12.02/2/index];
at org.elasticsearch.indices.recovery.RecoverySourceHandler.recoverToTarget(RecoverySourceHandler.java:135)
at org.elasticsearch.indices.recovery.RecoverySourceHandler.recoverToTarget(RecoverySourceHandler.java:135)
at org.elasticsearch.indices.recovery.RecoverySource.recover(RecoverySource.java:126)
at org.elasticsearch.indices.recovery.RecoverySource.access$200(RecoverySource.java:52)
at org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(RecoverySource.java:135)
at org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(RecoverySource.java:132)
at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.doRun(MessageChannelHandler.java:299)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: [logstash-2016.12.02][[logstash-2016.12.02][2]] RecoverFilesRecoveryException[Failed to transfer [0] files with total size of [0b]]; nested: NoSuchFileException[/var/log/elasticsearch/netprod/nodes/0/indices/logstash-2016.12.02/2/index];


(Mark Walkom) #4

How were the indices deleted?


(system) #5

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.