Why Cluster Status is Red, please help


(Thy Fere) #1

Hi,

I ran this command curl -XGET localhost:9200/_cluster//health?pretty=true and got the following result:

"cluster_name" : "elasticsearch",
"status" : "red",
"timed_out" : false,
"number_of_nodes" : 1,
"number_of_data_nodes" : 1,
"active_primary_shards" : 504,
"active_shards" : 504,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 688,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 1,
"number_of_in_flight_fetch" : 0

I am running Elasticsearch and Kibana on Windows box. The following are my current log entries:

[2016-01-20 16:30:51,130][WARN ][cluster.action.shard ] [ABC-Elastic1] [logstash-2016.01.01][2] received shard failed for [logstash-2016.01.01][2], node[j_ZQyQsvRRGBvPo1NYifeA], [P], s[INITIALIZING], unassigned_info[[reason=ALLOCATION_FAILED], at[2016-01-20T13:30:45.254Z], details[shard failure [failed recovery][IndexShardGatewayRecoveryException[[logstash-2016.01.01][2] failed recovery]; nested: ElasticsearchException[failed to obtain write log pre translog recovery]; nested: LockObtainFailedException[Lock obtain timed out: NativeFSLock@\xyz-synology1.abc.xyz.edu\syslog\elasticsearch\nodes\0\indices\logstash-2016.01.01\2\index\write.lock]; ]]], indexUUID [XnyU0jKKRJ6vsz7j3xdRrA], reason [shard failure [failed recovery][IndexShardGatewayRecoveryException[[logstash-2016.01.01][2] failed recovery]; nested: ElasticsearchException[failed to obtain write log pre translog recovery]; nested: LockObtainFailedException[Lock obtain timed out: NativeFSLock@\xyz-synology1.abc.xyz.edu\syslog\elasticsearch\nodes\0\indices\logstash-2016.01.01\2\index\write.lock]; ]]
[2016-01-20 16:30:57,928][INFO ][monitor.jvm ] [ABC-Elastic1] [gc][old][1343][222] duration [6.5s], collections [1]/[7s], total [6.5s]/[5.2m], memory [2.8gb]->[2.8gb]/[2.9gb], all_pools {[young] [146.4kb]->[8mb]/[133.1mb]}{[survivor] [0b]->[0b]/[16.6mb]}{[old] [2.8gb]->[2.8gb]/[2.8gb]}
[2016-01-20 16:30:58,459][INFO ][cluster.metadata ] [ABC-Elastic1] [logstash-2016.01.20] update_mapping [nxlog-json] (dynamic)
[2016-01-20 16:31:04,835][INFO ][monitor.jvm ] [ABC-Elastic1] [gc][old][1345][223] duration [5.5s], collections [1]/[5.9s], total [5.5s]/[5.3m], memory [2.9gb]->[2.8gb]/[2.9gb], all_pools {[young] [133.1mb]->[12.6mb]/[133.1mb]}{[survivor] [15mb]->[0b]/[16.6mb]}{[old] [2.8gb]->[2.8gb]/[2.8gb]}
[2016-01-20 16:31:05,194][INFO ][cluster.metadata ] [ABC-Elastic1] [logstash-2016.01.20] update_mapping [nxlog-json] (dynamic)
[2016-01-20 16:31:06,788][WARN ][indices.cluster ] [ABC-Elastic1] [[logstash-2016.01.01][2]] marking and sending shard failed due to [failed recovery]
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException: [logstash-2016.01.01][2] failed recovery
at org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:162)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.ElasticsearchException: failed to obtain write log pre translog recovery
at org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:228)
at org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:112)
... 3 more
Caused by: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: NativeFSLock@\xyz-synology1.abc.xyz.edu\syslog\elasticsearch\nodes\0\indices\logstash-2016.01.01\2\index\write.lock
at org.apache.lucene.store.Lock.obtain(Lock.java:89)
at org.elasticsearch.common.lucene.Lucene.acquireLock(Lucene.java:149)
at org.elasticsearch.common.lucene.Lucene.acquireWriteLock(Lucene.java:140)
at org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:183)
... 4 more


(Mark Walkom) #2

Basically it's red cause you have unassigned primaries.

It looks like you are using NFS and it's timing out.


How can I move data from one disk to another?
(Thy Fere) #3

Thanks. What could be the reason for timing out? I am a newbie to elk stack. Btw, what are primaries.


(Mark Walkom) #4

NFS is slow by design, but then you are on a consumer NAS which is slow again.

Also take a read of https://www.elastic.co/guide/en/elasticsearch/guide/master/_how_primary_and_replica_shards_interact.html


(Thy Fere) #5

But if we need any reliable external storage, what should we use, SAN Vol? So, cluster is in red status due to slow NAS? I am using only one node, should I turn off the replica?


(Mark Walkom) #6

Turning off replicas won't help, it just looks like the NAS is slow.


(Thy Fere) #7

So, if you have to suggest someone a solution for a mounted volume, will you suggest NAS or SAN volume or any other solution?


(Mark Walkom) #8

Local is king.
NAS and SAN can work, they just need to be decent quality.


(Thy Fere) #9

In fact, it's a VMware virtual machine; therefore, we want to use any reliable data storage source.


(system) #10