Elasticsearch Status Red - Help Required

Hi Team,

After creating certain number of index in Elastic Search (ES), when I am trying to restart the server it is not able to recover earlier created indices.

Hence ES web URL shows Status is RED and its not running.

Earlier I changed the data directory and created the indices again, then it ran for few indices but now I am facing the same issue again.

Hence its very difficult to create all the indices again by changing the data directory.

Could you please help me here, if there is any way to resolve the issue without changing the data directory and without recreating the indices ?

Thanks & Regards
Uttam Maji

Can you please share your cluster details (How many nodes) , ES version and
also logs.

Hi Ravi,

Please find details below

  1. single node

  2. ES version 2.2.0

  3. ES Log file details

[87]: index [.marvel-es-2016.07.28], type [shards], id [4h43RPMqRNat2B_z5DKfTQ:sLjUqtgoQmSPmUca4bSgnA:agl_cb_sd-ade0-ab00_disk_space:2:p], message [NodeClosedException[node closed {Tyrannus}{sLjUqtgoQmSPmUca4bSgnA}{127.0.0.1}{127.0.0.1:9300}]]

Caused by: java.nio.file.FileSystemException: /installdir/ELK/elasticsearch-2.2.0/data1/elasticsearch/nodes/0/indices/agl_cb_sd-3461-1598_disk_space/4/translog/translog.ckp: Too many open files

[2016-07-28 04:45:29,056][WARN ][cluster.action.shard ] [Tyrannus] [cb_jpt2_disk_space][2] received shard failed for [cb_jpt2_disk_space][2], node[sLjUqtgoQmSPmUca4bSgnA], [P], v[33], s[INITIALIZING], a[id=8-MyaId0SmS7U7dXwka7aA], unassigned_info[[reason=ALLOCATION_FAILED], at[2016-07-28T08:45:28.781Z], details[failed recovery, failure IndexShardRecoveryException[failed to recovery from gateway]; nested: EngineCreationFailureException[failed to open reader on writer]; nested: FileSystemException[/installdir/ELK/elasticsearch-2.2.0/data1/elasticsearch/nodes/0/indices/cb_jpt2_disk_space/2/index/_a_Lucene50_0.tim: Too many open files]; ]], indexUUID [lJMKGd_wSFCW1b_WyZlRng], message [master {Tyrannus}{sLjUqtgoQmSPmUca4bSgnA}{127.0.0.1}{127.0.0.1:9300} marked shard as initializing, but shard is marked as failed, resend shard failure], failure [Unknown]

[2016-07-28 04:45:29,057][INFO ][node ] [Tyrannus] stopping ...
[2016-07-28 04:45:29,072][WARN ][cluster.action.shard ] [Tyrannus] [agl_cb_1598_jvm_status][1] received shard failed for [agl_cb_1598_jvm_status][1], node[sLjUqtgoQmSPmUca4bSgnA], [P], v[17], s[INITIALIZING], a[id=DAJ3npIsT2aZLodNNluo8w], unassigned_info[[reason=ALLOCATION_FAILED], at[2016-07-28T08:45:28.841Z], details[failed recovery, failure IndexShardRecoveryException[failed to recovery from gateway]; nested: EngineCreationFailureException[failed to open reader on writer]; nested: FileSystemException[/installdir/ELK/elasticsearch-2.2.0/data1/elasticsearch/nodes/0/indices/agl_cb_1598_jvm_status/1/index: Too many open files]; ]], indexUUID [wBy1o9P6SMiIi5lGx63izw], message [master {Tyrannus}{sLjUqtgoQmSPmUca4bSgnA}{127.0.0.1}{127.0.0.1:9300} marked shard as initializing, but shard is marked as failed, resend shard failure], failure [Unknown]
[2016-07-28 04:45:29,075][WARN ][indices.memory ] [Tyrannus] failed to set shard [agl_cb_sd-3461-1598_disk_space][4] index buffer to [4mb]

From your logs I think this method will solve. You didnt mention your OS model.

For Red Hat Enterprise Linux Server release 6.4 (Santiago)

Open the file
vim /etc/security/limits.conf
& write
`* - nofile 999999

  •    soft     nofile   999999
    
  •    hard     nofile   999999`
    

& then try to bring up your server.

OS is Linux.

Tried adding details in /etc/security/limits.conf

But still the same issue. Got below logs in terminal.

[96]: index [.marvel-es-2016.08.01], type [shards], id [s3fYEnEyTfKirOQr57vQQQ:_na:agl_cb_57da_jvm_status:1:r], message [NodeClosedException[node closed {Joe Fixit}{iVjAAceVS9SuYtdZhuDjVQ}{127.0.0.1}{127.0.0.1:9300}]]
[97]: index [.marvel-es-2016.08.01], type [shards], id [s3fYEnEyTfKirOQr57vQQQ:_na:agl_cb_57da_jvm_status:3:p], message [NodeClosedException[node closed {Joe Fixit}{iVjAAceVS9SuYtdZhuDjVQ}{127.0.0.1}{127.0.0.1:9300}]]
[98]: index [.marvel-es-2016.08.01], type [shards], id [s3fYEnEyTfKirOQr57vQQQ:_na:agl_cb_57da_jvm_status:3:r], message [NodeClosedException[node closed {Joe Fixit}{iVjAAceVS9SuYtdZhuDjVQ}{127.0.0.1}{127.0.0.1:9300}]]
[99]: index [.marvel-es-2016.08.01], type [shards], id [s3fYEnEyTfKirOQr57vQQQ:_na:agl_cb_57da_jvm_status:4:p], message [NodeClosedException[node closed {Joe Fixit}{iVjAAceVS9SuYtdZhuDjVQ}{127.0.0.1}{127.0.0.1:9300}]]]
at org.elasticsearch.marvel.agent.exporter.local.LocalBulk.flush(LocalBulk.java:114)
at org.elasticsearch.marvel.agent.exporter.ExportBulk$Compound.flush(ExportBulk.java:101)

You said you created a 'certain number of indices'. How many indices/shards do you have on the node? How much heap do you have assigned to the node?

I have total 117 indices. and using default heap size. How to check this?

If you are using the default configurations your 117 indices might each consist of 5 shards, which gives over 500 shards. This is a lot for a node with the default heap size go 1GB. You can find out how much heap you have through the node stats API, e.g.: curl -XGET 'http://localhost:9200/_nodes/stats/jvm'

If you are running with default settings I would recommend reducing the number of indices/shards and/or increasing the heap size.

Hi Christian,

I tried increasing the heap size to 4gb, but still its failing.

Please could you help ?

[2016-08-02 09:24:29,858][INFO ][gateway ] [Emplate] recovered [118] indices into cluster_state

[2016-08-02 09:24:35,283][WARN ][indices.cluster ] [Emplate] [[agl_cb_sd-3461-1598_disk_space][4]] marking and sending shard failed due to [failed recovery]
[agl_cb_sd-3461-1598_disk_space][[agl_cb_sd-3461-1598_disk_space][4]] IndexShardRecoveryException[failed to recovery from gateway]; nested: EngineCreationFailureException[failed to create engine]; nested: NoSuchFileException[/installdir/ELK/elasticsearch-2.2.0/data1/elasticsearch/nodes/0/indices/agl_cb_sd-3461-1598_disk_space/4/translog/translog-6.ckp];
at org.elasticsearch.index.shard.StoreRecoveryService.recoverFromStore(StoreRecoveryService.java:254)
at org.elasticsearch.index.shard.StoreRecoveryService.access$100(StoreRecoveryService.java:56)
at org.elasticsearch.index.shard.StoreRecoveryService$1.run(StoreRecoveryService.java:129)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

Caused by: [agl_cb_sd-3461-1598_disk_space][[agl_cb_sd-3461-1598_disk_space][4]] EngineCreationFailureException[failed to create engine]; nested: NoSuchFileException[/installdir/ELK/elasticsearch-2.2.0/data1/elasticsearch/nodes/0/indices/agl_cb_sd-3461-1598_disk_space/4/translog/translog-6.ckp];
at org.elasticsearch.index.engine.InternalEngine.(InternalEngine.java:156)
at org.elasticsearch.index.engine.InternalEngineFactory.newReadWriteEngine(InternalEngineFactory.java:25)
at org.elasticsearch.index.shard.IndexShard.newEngine(IndexShard.java:1450)
at org.elasticsearch.index.shard.IndexShard.createNewEngine(IndexShard.java:1434)
at org.elasticsearch.index.shard.IndexShard.internalPerformTranslogRecovery(IndexShard.java:925)
at org.elasticsearch.index.shard.IndexShard.performTranslogRecovery(IndexShard.java:897)
at org.elasticsearch.index.shard.StoreRecoveryService.recoverFromStore(StoreRecoveryService.java:245)
... 5 more