Why Cluster Status is Red, please help

Hi,

I ran this command curl -XGET localhost:9200/_cluster//health?pretty=true and got the following result:

"cluster_name" : "elasticsearch",
"status" : "red",
"timed_out" : false,
"number_of_nodes" : 1,
"number_of_data_nodes" : 1,
"active_primary_shards" : 504,
"active_shards" : 504,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 688,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 1,
"number_of_in_flight_fetch" : 0

I am running Elasticsearch and Kibana on Windows box. The following are my current log entries:

[2016-01-20 16:30:51,130][WARN ][cluster.action.shard ] [ABC-Elastic1] [logstash-2016.01.01][2] received shard failed for [logstash-2016.01.01][2], node[j_ZQyQsvRRGBvPo1NYifeA], [P], s[INITIALIZING], unassigned_info[[reason=ALLOCATION_FAILED], at[2016-01-20T13:30:45.254Z], details[shard failure [failed recovery][IndexShardGatewayRecoveryException[[logstash-2016.01.01][2] failed recovery]; nested: ElasticsearchException[failed to obtain write log pre translog recovery]; nested: LockObtainFailedException[Lock obtain timed out: NativeFSLock@\xyz-synology1.abc.xyz.edu\syslog\elasticsearch\nodes\0\indices\logstash-2016.01.01\2\index\write.lock]; ]]], indexUUID [XnyU0jKKRJ6vsz7j3xdRrA], reason [shard failure [failed recovery][IndexShardGatewayRecoveryException[[logstash-2016.01.01][2] failed recovery]; nested: ElasticsearchException[failed to obtain write log pre translog recovery]; nested: LockObtainFailedException[Lock obtain timed out: NativeFSLock@\xyz-synology1.abc.xyz.edu\syslog\elasticsearch\nodes\0\indices\logstash-2016.01.01\2\index\write.lock]; ]]
[2016-01-20 16:30:57,928][INFO ][monitor.jvm ] [ABC-Elastic1] [gc][old][1343][222] duration [6.5s], collections [1]/[7s], total [6.5s]/[5.2m], memory [2.8gb]->[2.8gb]/[2.9gb], all_pools {[young] [146.4kb]->[8mb]/[133.1mb]}{[survivor] [0b]->[0b]/[16.6mb]}{[old] [2.8gb]->[2.8gb]/[2.8gb]}
[2016-01-20 16:30:58,459][INFO ][cluster.metadata ] [ABC-Elastic1] [logstash-2016.01.20] update_mapping [nxlog-json] (dynamic)
[2016-01-20 16:31:04,835][INFO ][monitor.jvm ] [ABC-Elastic1] [gc][old][1345][223] duration [5.5s], collections [1]/[5.9s], total [5.5s]/[5.3m], memory [2.9gb]->[2.8gb]/[2.9gb], all_pools {[young] [133.1mb]->[12.6mb]/[133.1mb]}{[survivor] [15mb]->[0b]/[16.6mb]}{[old] [2.8gb]->[2.8gb]/[2.8gb]}
[2016-01-20 16:31:05,194][INFO ][cluster.metadata ] [ABC-Elastic1] [logstash-2016.01.20] update_mapping [nxlog-json] (dynamic)
[2016-01-20 16:31:06,788][WARN ][indices.cluster ] [ABC-Elastic1] [[logstash-2016.01.01][2]] marking and sending shard failed due to [failed recovery]
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException: [logstash-2016.01.01][2] failed recovery
at org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:162)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.ElasticsearchException: failed to obtain write log pre translog recovery
at org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:228)
at org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:112)
... 3 more
Caused by: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: NativeFSLock@\xyz-synology1.abc.xyz.edu\syslog\elasticsearch\nodes\0\indices\logstash-2016.01.01\2\index\write.lock
at org.apache.lucene.store.Lock.obtain(Lock.java:89)
at org.elasticsearch.common.lucene.Lucene.acquireLock(Lucene.java:149)
at org.elasticsearch.common.lucene.Lucene.acquireWriteLock(Lucene.java:140)
at org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:183)
... 4 more

Basically it's red cause you have unassigned primaries.

It looks like you are using NFS and it's timing out.

Thanks. What could be the reason for timing out? I am a newbie to elk stack. Btw, what are primaries.

NFS is slow by design, but then you are on a consumer NAS which is slow again.

Also take a read of https://www.elastic.co/guide/en/elasticsearch/guide/master/_how_primary_and_replica_shards_interact.html

But if we need any reliable external storage, what should we use, SAN Vol? So, cluster is in red status due to slow NAS? I am using only one node, should I turn off the replica?

Turning off replicas won't help, it just looks like the NAS is slow.

So, if you have to suggest someone a solution for a mounted volume, will you suggest NAS or SAN volume or any other solution?

Local is king.
NAS and SAN can work, they just need to be decent quality.

In fact, it's a VMware virtual machine; therefore, we want to use any reliable data storage source.