Retrying failed action with response code: 503 unavailable_shards_exception, reason - logstash primary shard is not active

After upgrading to the latest version of the ELK 7.6.2 , I tried to restart the all the services. But elasticsearch has errors

Below mentioned is the errors,

Apr 03 21:56:46 CLMserver logstash[3216]: [2020-04-03T21:56:46,255][INFO ][logstash.outputs.elasticsearch][main] retrying failed action with response code: 503 ({"type"=>"unavailable_shards_exception", "reason"=>"[logstash][0] primary shard is not active Timeout: [1m], request: [BulkShardRequest [[logstash][0]] containing [2] requests]"})

Apr 03 21:56:46 CLMserver logstash[3216]: [2020-04-03T21:56:46,255][INFO ][logstash.outputs.elasticsearch][main] Retrying individual bulk actions that failed or were rejected by the previous bulk request. {:count=>2}

Apr 03 21:56:48 CLMserver logstash[3216]: [2020-04-03T21:56:48,092][INFO ][logstash.outputs.elasticsearch][main] retrying failed action with response code: 503 ({"type"=>"unavailable_shards_exception", "reason"=>"[logstash][0] primary shard is not active Timeout: [1m], request: [BulkShardRequest [[logstash][0]] containing [index {[logstash][_doc][8OPcQHEBpL5BzkD_2yw6], source[n/a, actual length: [2.3kb], max length: 2kb]}]]"})

Apr 03 21:56:48 CLMserver logstash[3216]: [2020-04-03T21:56:48,093][INFO ][logstash.outputs.elasticsearch][main] Retrying individual bulk actions that failed or were rejected by the previous bulk request. {:count=>1}

Apr 03 21:56:48 CLMserver logstash[3216]: [2020-04-03T21:56:48,222][INFO ][logstash.outputs.elasticsearch][main] retrying failed action with response code: 503 ({"type"=>"unavailable_shards_exception", "reason"=>"[logstash][0] primary shard is not active Timeout: [1m], request: [BulkShardRequest [[logstash][0]] containing [index {[logstash][_doc][8ePcQHEBpL5BzkD_2yy9], source[n/a, actual length: [3.9kb], max length: 2kb]}]]"})

Apr 03 21:56:48 CLMserver logstash[3216]: [2020-04-03T21:56:48,223][INFO ][logstash.outputs.elasticsearch][main] Retrying individual bulk actions that failed or were rejected by the previous bulk request. {:count=>1}

Apr 03 21:56:54 CLMserver logstash[3216]: [2020-04-03T21:56:54,283][INFO ][logstash.outputs.elasticsearch][main] retrying failed action with response code: 503 ({"type"=>"unavailable_shards_exception", "reason"=>"[logstash][0] primary shard is not active Timeout: [1m], request: [BulkShardRequest [[logstash][0]] containing [index {[logstash][_doc][8uPcQHEBpL5BzkD_8yxp], source[n/a, actual length: [2.2kb], max length: 2kb]}]]"})

Apr 03 21:56:54 CLMserver logstash[3216]: [2020-04-03T21:56:54,285][INFO ][logstash.outputs.elasticsearch][main] Retrying individual bulk actions that failed or were rejected by the previous bulk request. {:count=>1}

Apr 03 21:57:11 CLMserver logstash[3216]: [2020-04-03T21:57:11,919][INFO ][logstash.outputs.elasticsearch][main] retrying failed action with response code: 503 ({"type"=>"unavailable_shards_exception", "reason"=>"[logstash][0] primary shard is not active Timeout: [1m], request: [BulkShardRequest [[logstash][0]] containing [index {[logstash][_doc][FePdQHEBpL5BzkD_OC9N], source[n/a, actual length: [4.5kb], max length: 2kb]}]]"})

Apr 03 21:57:11 CLMserver logstash[3216]: [2020-04-03T21:57:11,919][INFO ][logstash.outputs.elasticsearch][main] Retrying individual bulk actions that failed or were rejected by the previous bulk request. {:count=>1}

cluster health I have provided the cluster health status

{
  "cluster_name" : "elasticsearch",
  "status" : "red",
  "timed_out" : false,
  "number_of_nodes" : 1,
  "number_of_data_nodes" : 1,
  "active_primary_shards" : 1431,
  "active_shards" : 1431,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 3812,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 27.293534236124355
}

Elasticsearch Logs

Below are the elasticsearch logs that i have captured

[2020-04-05T21:21:31,137][WARN ][o.e.c.r.a.AllocationService] [esnode-1] failing shard [failed shard, shard [logstash-2019.03.15][1], node[MhH-Ta5aTxWragHyPSNk8A], [P], recovery_source[existing store recovery; bootstrap_history_uuid=false], s[INITIALIZING], a[id=oJ_JAxVGTUibQZwejljivw], unassigned_info[[reason=ALLOCATION_FAILED], at[2020-04-05T15:51:30.078Z], failed_attempts[4], failed_nodes[[MhH-Ta5aTxWragHyPSNk8A]], delayed=false, details[failed shard on node [MhH-Ta5aTxWragHyPSNk8A]: failed recovery, failure RecoveryFailedException[[logstash-2019.03.15][1]: Recovery failed on {esnode-1}{MhH-Ta5aTxWragHyPSNk8A}{XqKRjlWDQNahgozLTO_9Fg}{10.100.10.30}{10.100.10.30:9300}{dilm}{ml.machine_memory=134890430464, xpack.installed=true, ml.max_open_jobs=20}]; nested: IndexShardRecoveryException[failed to recover from gateway]; nested: EngineCreationFailureException[failed to create engine]; nested: FileSystemException[/esdata/elasticsearch/nodes/0/indices/tEJNWczOQn-Y_rG4RibSSQ/1/translog/translog-74.ckp: Too many open files]; ], allocation_status[no_valid_shard_copy]], message [failed recovery], failure [RecoveryFailedException[[logstash-2019.03.15][1]: Recovery failed on {esnode-1}{MhH-Ta5aTxWragHyPSNk8A}{XqKRjlWDQNahgozLTO_9Fg}{10.100.10.30}{10.100.10.30:9300}{dilm}{ml.machine_memory=134890430464, xpack.installed=true, ml.max_open_jobs=20}]; nested: IndexShardRecoveryException[failed to recover from gateway]; nested: EngineCreationFailureException[failed to create engine]; nested: FileSystemException[/esdata/elasticsearch/nodes/0/indices/tEJNWczOQn-Y_rG4RibSSQ/1/translog/translog-75.ckp: Too many open files]; ], markAsStale [true]]
org.elasticsearch.indices.recovery.RecoveryFailedException: [logstash-2019.03.15][1]: Recovery failed on {esnode-1}{MhH-Ta5aTxWragHyPSNk8A}{XqKRjlWDQNahgozLTO_9Fg}{10.100.10.30}{10.100.10.30:9300}{dilm}{ml.machine_memory=134890430464, xpack.installed=true, ml.max_open_jobs=20}
        at org.elasticsearch.index.shard.IndexShard.lambda$executeRecovery$21(IndexShard.java:2633) ~[elasticsearch-7.6.2.jar:7.6.2]
        at org.elasticsearch.action.ActionListener$1.onFailure(ActionListener.java:71) ~[elasticsearch-7.6.2.jar:7.6.2]
        at org.elasticsearch.index.shard.StoreRecovery.lambda$recoveryListener$6(StoreRecovery.java:352) ~[elasticsearch-7.6.2.jar:7.6.2]
        at org.elasticsearch.action.ActionListener$1.onFailure(ActionListener.java:71) ~[elasticsearch-7.6.2.jar:7.6.2]
        at org.elasticsearch.action.ActionListener.completeWith(ActionListener.java:287) ~[elasticsearch-7.6.2.jar:7.6.2]
        at org.elasticsearch.index.shard.StoreRecovery.recoverFromStore(StoreRecovery.java:94) ~[elasticsearch-7.6.2.jar:7.6.2]
        at org.elasticsearch.index.shard.IndexShard.recoverFromStore(IndexShard.java:1866) ~[elasticsearch-7.6.2.jar:7.6.2]
        at org.elasticsearch.action.ActionRunnable$2.doRun(ActionRunnable.java:73) ~[elasticsearch-7.6.2.jar:7.6.2]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:692) ~[elasticsearch-7.6.2.jar:7.6.2]
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-7.6.2.jar:7.6.2]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
        at java.lang.Thread.run(Thread.java:830) [?:?]
Caused by: org.elasticsearch.index.shard.IndexShardRecoveryException: failed to recover from gateway
        at org.elasticsearch.index.shard.StoreRecovery.internalRecoverFromStore(StoreRecovery.java:435) ~[elasticsearch-7.6.2.jar:7.6.2]
        at org.elasticsearch.index.shard.StoreRecovery.lambda$recoverFromStore$0(StoreRecovery.java:96) ~[elasticsearch-7.6.2.jar:7.6.2]
        at org.elasticsearch.action.ActionListener.completeWith(ActionListener.java:285) ~[elasticsearch-7.6.2.jar:7.6.2]
        ... 8 more
Caused by: org.elasticsearch.index.engine.EngineCreationFailureException: failed to create engine
        at org.elasticsearch.index.engine.InternalEngine.<init>(InternalEngine.java:239) ~[elasticsearch-7.6.2.jar:7.6.2]
        at org.elasticsearch.index.engine.InternalEngine.<init>(InternalEngine.java:191) ~[elasticsearch-7.6.2.jar:7.6.2]
        at org.elasticsearch.index.engine.InternalEngineFactory.newReadWriteEngine(InternalEngineFactory.java:25) ~[elasticsearch-7.6.2.jar:7.6.2]
        at org.elasticsearch.index.shard.IndexShard.innerOpenEngineAndTranslog(IndexShard.java:1626) ~[elasticsearch-7.6.2.jar:7.6.2]
        at org.elasticsearch.index.shard.IndexShard.openEngineAndRecoverFromTranslog(IndexShard.java:1592) ~[elasticsearch-7.6.2.jar:7.6.2]
        at org.elasticsearch.index.shard.StoreRecovery.internalRecoverFromStore(StoreRecovery.java:430) ~[elasticsearch-7.6.2.jar:7.6.2]
        at org.elasticsearch.index.shard.StoreRecovery.lambda$recoverFromStore$0(StoreRecovery.java:96) ~[elasticsearch-7.6.2.jar:7.6.2]
        at org.elasticsearch.action.ActionListener.completeWith(ActionListener.java:285) ~[elasticsearch-7.6.2.jar:7.6.2]
        ... 8 more
Caused by: java.nio.file.FileSystemException: /esdata/elasticsearch/nodes/0/indices/tEJNWczOQn-Y_rG4RibSSQ/1/translog/translog-75.ckp: Too many open files
        at sun.nio.fs.UnixException.translateToIOException(UnixException.java:100) ~[?:?]
        at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111) ~[?:?]
        at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:116) ~[?:?]
        at sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:219) ~[?:?]
        at java.nio.file.Files.newByteChannel(Files.java:374) ~[?:?]
        at java.nio.file.Files.newByteChannel(Files.java:425) ~[?:?]
        at org.apache.lucene.store.SimpleFSDirectory.openInput(SimpleFSDirectory.java:77) ~[lucene-core-8.4.0.jar:8.4.0 bc02ab906445fcf4e297f4ef00ab4a54fdd72ca2 - jpountz - 2019-12-19 20:16:14]
        at org.elasticsearch.index.translog.Checkpoint.read(Checkpoint.java:182) ~[elasticsearch-7.6.2.jar:7.6.2]
        at org.elasticsearch.index.translog.Translog.recoverFromFiles(Translog.java:247) ~[elasticsearch-7.6.2.jar:7.6.2]
        at org.elasticsearch.index.translog.Translog.<init>(Translog.java:193) ~[elasticsearch-7.6.2.jar:7.6.2]
        at org.elasticsearch.index.engine.InternalEngine.openTranslog(InternalEngine.java:507) ~[elasticsearch-7.6.2.jar:7.6.2]
        at org.elasticsearch.index.engine.InternalEngine.<init>(InternalEngine.java:219) ~[elasticsearch-7.6.2.jar:7.6.2]
        at org.elasticsearch.index.engine.InternalEngine.<init>(InternalEngine.java:191) ~[elasticsearch-7.6.2.jar:7.6.2]
        at org.elasticsearch.index.engine.InternalEngineFactory.newReadWriteEngine(InternalEngineFactory.java:25) ~[elasticsearch-7.6.2.jar:7.6.2]
        at org.elasticsearch.index.shard.IndexShard.innerOpenEngineAndTranslog(IndexShard.java:1626) ~[elasticsearch-7.6.2.jar:7.6.2]
        at org.elasticsearch.index.shard.IndexShard.openEngineAndRecoverFromTranslog(IndexShard.java:1592) ~[elasticsearch-7.6.2.jar:7.6.2]
        at org.elasticsearch.index.shard.StoreRecovery.internalRecoverFromStore(StoreRecovery.java:430) ~[elasticsearch-7.6.2.jar:7.6.2]
        at org.elasticsearch.index.shard.StoreRecovery.lambda$recoverFromStore$0(StoreRecovery.java:96) ~[elasticsearch-7.6.2.jar:7.6.2]
        at org.elasticsearch.action.ActionListener.completeWith(ActionListener.java:285) ~[elasticsearch-7.6.2.jar:7.6.2]
        ... 8 more

Pleases help rectify this

You probably have too many shards per node.

May I suggest you look at the following resources about sizing:

https://www.elastic.co/elasticon/conf/2016/sf/quantitative-cluster-sizing

And https://www.elastic.co/webinars/using-rally-to-get-your-elasticsearch-cluster-size-right

Can you help me configure how to do it related my setup. i'll provide all the logs required

Add more nodes, delete non needed indices, increase the HEAP size if you have 64gb of RAM... shrink the indices...

Many things to do.

But to summarize, reduce the number of shards per node. No more than 20 shards per gb of HEAP.

What is your heap size?

My setup heap size is 24Gb

So no more than 480 shards.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.