Kibana Monitoring Issue and Sharding Problem

Hi there
I have 2 problems which is probably related to each other:

I have a basic cluster with 2 nodes of data.
But my status is always "Yellow" and these 2nodes can not divide the primary shards and replicas:
So I wrote down the reroute: curl -XPOST "localhost:9200/_cluster/reroute?retry_failed=true"
It returned some logs but didn`t work properly (yet there are a number of unassigned shards)
These are some of the logs:

{"acknowledged":true,"state":{"cluster_uuid":"nembwwEEQGCpPb0zW0C8_A","version":261913,"state_uuid":"vK1LtVbzQ8-j7UNJ7DV-6w","master_node":"veBpqUZrSIC9pX1SNrTvug","blocks":{},"nodes":{"2m_WCRVhQk6SAHo6zIG6cA":{"name":"elk2","ephemeral_id":"icEWYUHrTS2lxW7lGJ9sBw","transport_address":"172.22.34.37:9300","attributes":{"ml.machine_memory":"3973464064","ml.max_open_jobs":"20","xpack.installed":"true"}},"veBpqUZrSIC9pX1SNrTvug":{"name":"elk1","ephemeral_id":"IAcFFk3TROapKhfq2FB7tg","transport_address":"172.22.34.36:9300","attributes":{"ml.machine_memory":"3973468160","ml.max_open_jobs":"20","xpack.installed":"true"}}},"routing_table":{"indices":{"testlog-2020.07.15":{"shards":{"0":[{"state":"STARTED","primary":true,"node":"veBpqUZrSIC9pX1SNrTvug","relocating_node":null,"shard":0,"index":"testlog-2020.07.15","allocation_id":{"id":"kM1UZhkxSbe1ModVTQxkmw"}},{"state":"UNASSIGNED","primary":false,"node":null,"relocating_node":null,"shard":0,"index":"testlog-2020.07.15","recovery_source":{"type":"PEER"},"unassigned_info":{"reason":"NODE_LEFT","at":"2021-04-26T05:01:20.117Z","delayed":false,"details":"node_left [2m_WCRVhQk6SAHo6zIG6cA]","allocation_status":"no_attempt"}}]}},"testlog-2020.11.27":{"shards":{"0":[{"state":"STARTED","primary":true,"node":"veBpqUZrSIC9pX1SNrTvug","relocating_node":null,"shard":0,"index":"testlog-2020.11.27","allocation_id":{"id":"WmSFM1m2T7uzCcc66R0A_w"}},{"state":"UNASSIGNED","primary":false,"node":null,"relocating_node":null,"shard":0,"index":"testlog-2020.11.27","recovery_source":{"type":"PEER"},"unassigned_info":{"reason":"NODE_LEFT","at":"2021-04-26T05:01:20.117Z","delayed":false,"details":"node_left [2m_WCRVhQk6SAHo6zIG6cA]","allocation_status":"no_attempt"}}]}},"testlog-2020.09.24":{"shards":{"0":[{"state":"STARTED","primary":true,"node":"veBpqUZrSIC9pX1SNrTvug","relocating_node":null,"shard":0,"index":"testlog-2020.09.24","allocation_id":{"id":"zWPF9tY4SpeEjlCrbWqJeA"}},{"state":"UNASSIGNED","primary":false,"node":null,"relocating_node":null,"shard":0,"index":"testlog-2020.09.24","recovery_source":{"type":"PEER"},"unassigned_info":{"reason":"NODE_LEFT","at":"2021-04-24T09:47:12.975Z","delayed":false,"details":"node_left [2m_WCRVhQk6SAHo6zIG6cA]","allocation_status":"no_attempt"}}]}}...

This is my Kibana monitoring:

Beside that
I have another problem, I have some errors when I`m in stack monitoring:

My RAM and JVM Heap is ok (each node has RAM 4G and 1.9 JVM Heap), I don`t know where is the problem?
These are the relative logs in Kibana:

{"type":"log","@timestamp":"2021-04-26T05:36:07Z","tags":["status","plugin:spaces@7.6.2","error"],"pid":4153,"state":"red","message":"Status changed from red to red - [parent] Data too large, data for [<http_request>] would be [989246376/943.4mb], which is larger than the limit of [986932838/941.2mb], real usage: [989246376/943.4mb], new bytes reserved: [0/0b], usages [request=0/0b, fielddata=44990/43.9kb, in_flight_requests=0/0b, accounting=60808081/57.9mb]: [circuit_breaking_exception] [parent] Data too large, data for [<http_request>] would be [989246376/943.4mb], which is larger than the limit of [986932838/941.2mb], real usage: [989246376/943.4mb], new bytes reserved: [0/0b], usages [request=0/0b, fielddata=44990/43.9kb, in_flight_requests=0/0b, accounting=60808081/57.9mb], with { bytes_wanted=989246376 & bytes_limit=986932838 & durability=\"PERMANENT\" }","prevState":"red","prevMsg":"[parent] Data too large, data for [<http_request>] would be [988629224/942.8mb], which is larger than the limit of [986932838/941.2mb], real usage: [988629224/942.8mb], new bytes reserved: [0/0b], usages [request=0/0b, fielddata=44990/43.9kb, in_flight_requests=0/0b, accounting=60808081/57.9mb]: [circuit_breaking_exception] [parent] Data too large, data for [<http_request>] would be [988629224/942.8mb], which is larger than the limit of [986932838/941.2mb], real usage: [988629224/942.8mb], new bytes reserved: [0/0b], usages [request=0/0b, fielddata=44990/43.9kb, in_flight_requests=0/0b, accounting=60808081/57.9mb], with { bytes_wanted=988629224 & bytes_limit=986932838 & durability=\"PERMANENT\" }"}

And these are one of the Elasticsearch logs at the moment:

[2021-04-26T10:11:53,115][DEBUG][o.e.a.g.TransportGetAction] [elk2] null: failed to execute [get [.kibana][_doc][space:default]: routing [null]]
org.elasticsearch.transport.RemoteTransportException: [elk1][172.22.34.36:9300][indices:data/read/get[s]]
Caused by: org.elasticsearch.common.breaker.CircuitBreakingException: [parent] Data too large, data for [<transport_request>] would be [990432222/944.5mb], which is larger than the limit of [986932838/941.2mb], real usage: [990431960/944.5mb], new bytes reserved: [262/262b], usages [request=0/0b, fielddata=47965/46.8kb, in_flight_requests=262/262b, accounting=60837409/58mb]
        at org.elasticsearch.indices.breaker.HierarchyCircuitBreakerService.checkParentLimit(HierarchyCircuitBreakerService.java:343) ~[elasticsearch-7.6.2.jar:7.6.2]
        at org.elasticsearch.common.breaker.ChildMemoryCircuitBreaker.addEstimateBytesAndMaybeBreak(ChildMemoryCircuitBreaker.java:128) ~[elasticsearch-7.6.2.jar:7.6.2]
        at org.elasticsearch.transport.InboundHandler.handleRequest(InboundHandler.java:171) [elasticsearch-7.6.2.jar:7.6.2]
        at org.elasticsearch.transport.InboundHandler.messageReceived(InboundHandler.java:119) [elasticsearch-7.6.2.jar:7.6.2]
        at org.elasticsearch.transport.InboundHandler.inboundMessage(InboundHandler.java:103) [elasticsearch-7.6.2.jar:7.6.2]
        at org.elasticsearch.transport.TcpTransport.inboundMessage(TcpTransport.java:667) [elasticsearch-7.6.2.jar:7.6.2]
        at org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:62) [transport-netty4-client-7.6.2.jar:7.6.2]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374) [netty-transport-4.1.43.Final.jar:4.1.43.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360) [netty-transport-4.1.43.Final.jar:4.1.43.Final]
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352) [netty-transport-4.1.43.Final.jar:4.1.43.Final]
        at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:326) [netty-codec-4.1.43.Final.jar:4.1.43.Final]
        at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:300) [netty-codec-4.1.43.Final.jar:4.1.43.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374) [netty-transport-4.1.43.Final.jar:4.1.43.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360) [netty-transport-4.1.43.Final.jar:4.1.43.Final]
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352) [netty-transport-4.1.43.Final.jar:4.1.43.Final]
        at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:241) [netty-handler-4.1.43.Final.jar:4.1.43.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374) [netty-transport-4.1.43.Final.jar:4.1.43.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360) [netty-transport-4.1.43.Final.jar:4.1.43.Final]
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352) [netty-transport-4.1.43.Final.jar:4.1.43.Final]
        at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1422) [netty-transport-4.1.43.Final.jar:4.1.43.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374) [netty-transport-4.1.43.Final.jar:4.1.43.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360) [netty-transport-4.1.43.Final.jar:4.1.43.Final]
        at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:931) [netty-transport-4.1.43.Final.jar:4.1.43.Final]
        at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163) [netty-transport-4.1.43.Final.jar:4.1.43.Final]
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:700) [netty-transport-4.1.43.Final.jar:4.1.43.Final]
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:600) [netty-transport-4.1.43.Final.jar:4.1.43.Final]
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:554) [netty-transport-4.1.43.Final.jar:4.1.43.Final]
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:514) [netty-transport-4.1.43.Final.jar:4.1.43.Final]
        at io.netty.util.concurrent.SingleThreadEventExecutor$6.run(SingleThreadEventExecutor.java:1050) [netty-common-4.1.43.Final.jar:4.1.43.Final]
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-common-4.1.43.Final.jar:4.1.43.Final]
        at java.lang.Thread.run(Thread.java:830) [?:?]

Could you help me about these problems?
How can I fix this monitoring issue and why my status does not change to "Green" ??

Thanks in advance

TLDR you have way too many shards for your heap size. Either increase the heap size or reduce your shard count.

Hi
Yes, you`re right. The problem was my heap size. I have to increase the amount of RAM to assign more heap.
Thank you so much

1 Like

Hi again,
I have increased my shards but this problem still remains:


This is just a part of my elasticsearch log:

[2021-05-03T15:46:20,196][WARN ][o.e.i.IndexService       ] [elk2] [testlog-2020.12.14] failed to write dangling indices state for index [testlog-2020.12.14/vgLF61z6SB2-LKXkyj03vQ]
org.elasticsearch.gateway.WriteStateException: exception during looking up new generation id
        at org.elasticsearch.gateway.MetaDataStateFormat.write(MetaDataStateFormat.java:225) ~[elasticsearch-7.6.2.jar:7.6.2]
        at org.elasticsearch.gateway.MetaDataStateFormat.writeAndCleanup(MetaDataStateFormat.java:185) ~[elasticsearch-7.6.2.jar:7.6.2]
        at org.elasticsearch.index.IndexService.writeDanglingIndicesInfo(IndexService.java:337) [elasticsearch-7.6.2.jar:7.6.2]
        at org.elasticsearch.indices.IndicesService$5.doRun(IndicesService.java:1559) [elasticsearch-7.6.2.jar:7.6.2]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:692) [elasticsearch-7.6.2.jar:7.6.2]
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-7.6.2.jar:7.6.2]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
        at java.lang.Thread.run(Thread.java:830) [?:?]
Caused by: java.nio.file.FileSystemException: /var/lib/elasticsearch/nodes/0/indices/vgLF61z6SB2-LKXkyj03vQ/_state: Too many open files
        at sun.nio.fs.UnixException.translateToIOException(UnixException.java:100) ~[?:?]
        at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111) ~[?:?]
        at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:116) ~[?:?]
        at sun.nio.fs.UnixFileSystemProvider.newDirectoryStream(UnixFileSystemProvider.java:432) ~[?:?]
        at java.nio.file.Files.newDirectoryStream(Files.java:543) ~[?:?]
        at org.elasticsearch.gateway.MetaDataStateFormat.findMaxGenerationId(MetaDataStateFormat.java:353) ~[elasticsearch-7.6.2.jar:7.6.2]
        at org.elasticsearch.gateway.MetaDataStateFormat.write(MetaDataStateFormat.java:222) ~[elasticsearch-7.6.2.jar:7.6.2]
        ... 8 more
[2021-05-03T15:46:20,200][WARN ][o.e.i.IndexService       ] [elk2] [testlog-2021.01.25] failed to write dangling indices state for index [testlog-2021.01.25/2fbsoSMhSnK2ecyehTW75w]
org.elasticsearch.gateway.WriteStateException: exception during looking up new generation id
        at org.elasticsearch.gateway.MetaDataStateFormat.write(MetaDataStateFormat.java:225) ~[elasticsearch-7.6.2.jar:7.6.2]
        at org.elasticsearch.gateway.MetaDataStateFormat.writeAndCleanup(MetaDataStateFormat.java:185) ~[elasticsearch-7.6.2.jar:7.6.2]
        at org.elasticsearch.index.IndexService.writeDanglingIndicesInfo(IndexService.java:337) [elasticsearch-7.6.2.jar:7.6.2]
        at org.elasticsearch.indices.IndicesService$5.doRun(IndicesService.java:1559) [elasticsearch-7.6.2.jar:7.6.2]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:692) [elasticsearch-7.6.2.jar:7.6.2]
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-7.6.2.jar:7.6.2]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
        at java.lang.Thread.run(Thread.java:830) [?:?]
Caused by: java.nio.file.FileSystemException: /var/lib/elasticsearch/nodes/0/indices/2fbsoSMhSnK2ecyehTW75w/_state: Too many open files
        at sun.nio.fs.UnixException.translateToIOException(UnixException.java:100) ~[?:?]
        at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111) ~[?:?]
        at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:116) ~[?:?]
        at sun.nio.fs.UnixFileSystemProvider.newDirectoryStream(UnixFileSystemProvider.java:432) ~[?:?]
        at java.nio.file.Files.newDirectoryStream(Files.java:543) ~[?:?]
        at org.elasticsearch.gateway.MetaDataStateFormat.findMaxGenerationId(MetaDataStateFormat.java:353) ~[elasticsearch-7.6.2.jar:7.6.2]
        at org.elasticsearch.gateway.MetaDataStateFormat.write(MetaDataStateFormat.java:222) ~[elasticsearch-7.6.2.jar:7.6.2]
[2021-05-03T15:49:46,486][WARN ][o.e.i.s.IndexShard       ] [elk2] [testlog-2019.12.10][0] failed to turn off translog retention
org.apache.lucene.store.AlreadyClosedException: engine is closed
        at org.elasticsearch.index.shard.IndexShard.getEngine(IndexShard.java:2528) ~[elasticsearch-7.6.2.jar:7.6.2]
        at org.elasticsearch.index.shard.IndexShard.trimTranslog(IndexShard.java:1106) ~[elasticsearch-7.6.2.jar:7.6.2]
        at org.elasticsearch.index.shard.IndexShard$3.doRun(IndexShard.java:1944) [elasticsearch-7.6.2.jar:7.6.2]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:692) [elasticsearch-7.6.2.jar:7.6.2]
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-7.6.2.jar:7.6.2]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
        at java.lang.Thread.run(Thread.java:830) [?:?]
[2021-05-03T15:49:46,486][WARN ][o.e.i.s.IndexShard       ] [elk2] [testlog-2019.11.26][0] failed to turn off translog retention
org.apache.lucene.store.AlreadyClosedException: engine is closed
        at org.elasticsearch.index.shard.IndexShard.getEngine(IndexShard.java:2528) ~[elasticsearch-7.6.2.jar:7.6.2]
        at org.elasticsearch.index.shard.IndexShard.trimTranslog(IndexShard.java:1106) ~[elasticsearch-7.6.2.jar:7.6.2]
        at org.elasticsearch.index.shard.IndexShard$3.doRun(IndexShard.java:1944) [elasticsearch-7.6.2.jar:7.6.2]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:692) [elasticsearch-7.6.2.jar:7.6.2]
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-7.6.2.jar:7.6.2]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
        at java.lang.Thread.run(Thread.java:830) [?:?]

I already wrote curl -XPOST "localhost:9200/_cluster/reroute?retry_failed=true" in order to redistribute unassigned shards.
But, after that, my node2 releases all data and allocates shards from the beginning, then stop about the number of above picture...

 curl -XGET localhost:9200/_cluster/allocation/explain?pretty
{
  "index" : "testlog-2020.05.27",
  "shard" : 0,
  "primary" : false,
  "current_state" : "unassigned",
  "unassigned_info" : {
    "reason" : "ALLOCATION_FAILED",
    "at" : "2021-05-03T11:17:24.408Z",
    "failed_allocation_attempts" : 5,
    "details" : "failed shard on node [2m_WCRVhQk6SAHo6zIG6cA]: failed recovery, failure RecoveryFailedException[[testlog-2020.05.27][0]: Recovery failed from {elk1}{veBpqUZrSIC9pX1SNrTvug}{whzfgE4zRBSTfdX5JTYJDg}{172.22.34.36}{172.22.34.36:9300}{dilm}{ml.machine_memory=8201326592, ml.max_open_jobs=20, xpack.installed=true} into {elk2}{2m_WCRVhQk6SAHo6zIG6cA}{5RfsPh5dT3yMGUyO6gCpMQ}{172.22.34.37}{172.22.34.37:9300}{dilm}{ml.machine_memory=8201322496, xpack.installed=true, ml.max_open_jobs=20}]; nested: RemoteTransportException[[elk1][172.22.34.36:9300][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[Phase[1] prepare target for translog failed]; nested: RemoteTransportException[[elk2][172.22.34.37:9300][internal:index/shard/recovery/prepare_translog]]; nested: EngineCreationFailureException[failed to create engine]; nested: FileSystemException[/var/lib/elasticsearch/nodes/0/indices/mEePgYxBSNKI9Jpi2yOSnA/0/index: Too many open files]; ",
    "last_allocation_status" : "no_attempt"
  },
  "can_allocate" : "no",
  "allocate_explanation" : "cannot allocate because allocation is not permitted to any of the nodes",
  "node_allocation_decisions" : [
    {
      "node_id" : "2m_WCRVhQk6SAHo6zIG6cA",
      "node_name" : "elk2",
      "transport_address" : "172.22.34.37:9300",
      "node_attributes" : {
        "ml.machine_memory" : "8201322496",
        "ml.max_open_jobs" : "20",
        "xpack.installed" : "true"
      },
      "node_decision" : "no",
      "deciders" : [
        {
          "decider" : "max_retry",
          "decision" : "NO",
          "explanation" : "shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2021-05-03T11:17:24.408Z], failed_attempts[5], failed_nodes[[2m_WCRVhQk6SAHo6zIG6cA]], delayed=false, details[failed shard on node [2m_WCRVhQk6SAHo6zIG6cA]: failed recovery, failure RecoveryFailedException[[testlog-2020.05.27][0]: Recovery failed from {elk1}{veBpqUZrSIC9pX1SNrTvug}{whzfgE4zRBSTfdX5JTYJDg}{172.22.34.36}{172.22.34.36:9300}{dilm}{ml.machine_memory=8201326592, ml.max_open_jobs=20, xpack.installed=true} into {elk2}{2m_WCRVhQk6SAHo6zIG6cA}{5RfsPh5dT3yMGUyO6gCpMQ}{172.22.34.37}{172.22.34.37:9300}{dilm}{ml.machine_memory=8201322496, xpack.installed=true, ml.max_open_jobs=20}]; nested: RemoteTransportException[[elk1][172.22.34.36:9300][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[Phase[1] prepare target for translog failed]; nested: RemoteTransportException[[elk2][172.22.34.37:9300][internal:index/shard/recovery/prepare_translog]]; nested: EngineCreationFailureException[failed to create engine]; nested: FileSystemException[/var/lib/elasticsearch/nodes/0/indices/mEePgYxBSNKI9Jpi2yOSnA/0/index: Too many open files]; ], allocation_status[no_attempt]]]"
        },
        {
          "decider" : "throttling",
          "decision" : "THROTTLE",
          "explanation" : "reached the limit of incoming shard recoveries [2], cluster setting [cluster.routing.allocation.node_concurrent_incoming_recoveries=2] (can also be set via [cluster.routing.allocation.node_concurrent_recoveries])"
        }
      ]
    },
    {
      "node_id" : "veBpqUZrSIC9pX1SNrTvug",
      "node_name" : "elk1",
      "transport_address" : "172.22.34.36:9300",
      "node_attributes" : {
        "ml.machine_memory" : "8201326592",
        "ml.max_open_jobs" : "20",
        "xpack.installed" : "true"
      },
      "node_decision" : "no",
      "deciders" : [
        {
          "decider" : "max_retry",
          "decision" : "NO",
          "explanation" : "shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2021-05-03T11:17:24.408Z], failed_attempts[5], failed_nodes[[2m_WCRVhQk6SAHo6zIG6cA]], delayed=false, details[failed shard on node [2m_WCRVhQk6SAHo6zIG6cA]: failed recovery, failure RecoveryFailedException[[testlog-2020.05.27][0]: Recovery failed from {elk1}{veBpqUZrSIC9pX1SNrTvug}{whzfgE4zRBSTfdX5JTYJDg}{172.22.34.36}{172.22.34.36:9300}{dilm}{ml.machine_memory=8201326592, ml.max_open_jobs=20, xpack.installed=true} into {elk2}{2m_WCRVhQk6SAHo6zIG6cA}{5RfsPh5dT3yMGUyO6gCpMQ}{172.22.34.37}{172.22.34.37:9300}{dilm}{ml.machine_memory=8201322496, xpack.installed=true, ml.max_open_jobs=20}]; nested: RemoteTransportException[[elk1][172.22.34.36:9300][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[Phase[1] prepare target for translog failed]; nested: RemoteTransportException[[elk2][172.22.34.37:9300][internal:index/shard/recovery/prepare_translog]]; nested: EngineCreationFailureException[failed to create engine]; nested: FileSystemException[/var/lib/elasticsearch/nodes/0/indices/mEePgYxBSNKI9Jpi2yOSnA/0/index: Too many open files]; ], allocation_status[no_attempt]]]"
        },
        {
          "decider" : "same_shard",
          "decision" : "NO",
          "explanation" : "the shard cannot be allocated to the same node on which a copy of the shard already exists [[testlog-2020.05.27][0], node[veBpqUZrSIC9pX1SNrTvug], [P], s[STARTED], a[id=_RJkffb1Q9uHuq8Jwq3TVg]]"
        },
        {
          "decider" : "throttling",
          "decision" : "THROTTLE",
          "explanation" : "reached the limit of outgoing shard recoveries [2] on the node [veBpqUZrSIC9pX1SNrTvug] which holds the primary, cluster setting [cluster.routing.allocation.node_concurrent_outgoing_recoveries=2] (can also be set via [cluster.routing.allocation.node_concurrent_recoveries])"
        }
      ]
    }
  ]
}

Did you set File Descriptors | Elasticsearch Guide [7.12] | Elastic

I already set in /etc/security/limits.conf
elasticsearch nofile 65535
But my unassigned shard won't be fixed :worried:

[2021-05-05T15:00:07,220][WARN ][o.e.c.r.a.AllocationService] [elk1] failing shard [failed shard, shard [.monitoring-kibana-7-2021.05.05][0], node[2m_WCRVhQk6SAHo6zIG6cA], [R], recovery_source[peer recovery], s[INITIALIZING], a[id=otqL0ujwSlisYT2lOBGGDQ], unassigned_info[[reason=ALLOCATION_FAILED], at[2021-05-05T10:30:06.517Z], failed_attempts[4], failed_nodes[[2m_WCRVhQk6SAHo6zIG6cA]], delayed=false, details[failed shard on node [2m_WCRVhQk6SAHo6zIG6cA]: shard failure, reason [refresh failed source[peer-recovery]], failure FileSystemException[/var/lib/elasticsearch/nodes/0/indices/Usy1AOLNRDiOK09K3jMvmA/0/index/_1_Lucene80_0.dvm: Too many open files]], allocation_status[no_attempt]], expected_shard_size[1047435], message [shard failure, reason [lucene commit failed]], failure [FileSystemException[/var/lib/elasticsearch/nodes/0/indices/Usy1AOLNRDiOK09K3jMvmA/0/index/_2_Lucene80_0.dvd: Too many open files]], markAsStale [true]]
java.nio.file.FileSystemException: /var/lib/elasticsearch/nodes/0/indices/Usy1AOLNRDiOK09K3jMvmA/0/index/_2_Lucene80_0.dvd: Too many open files
        at sun.nio.fs.UnixException.translateToIOException(UnixException.java:100) ~[?:?]
        at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111) ~[?:?]
        at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:116) ~[?:?]
        at sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:219) ~[?:?]
        at java.nio.file.spi.FileSystemProvider.newOutputStream(FileSystemProvider.java:478) ~[?:?]
        at java.nio.file.Files.newOutputStream(Files.java:223) ~[?:?]
        at org.apache.lucene.store.FSDirectory$FSIndexOutput.<init>(FSDirectory.java:410) ~[lucene-core-8.4.0.jar:8.4.0 bc02ab906445fcf4e297f4ef00ab4a54fdd72ca2 - jpountz - 2019-12-19 20:16:14]
        at org.apache.lucene.store.FSDirectory$FSIndexOutput.<init>(FSDirectory.java:406) ~[lucene-core-8.4.0.jar:8.4.0 bc02ab906445fcf4e297f4ef00ab4a54fdd72ca2 - jpountz - 2019-12-19 20:16:14]
        at org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:254) ~[lucene-core-8.4.0.jar:8.4.0 bc02ab906445fcf4e297f4ef00ab4a54fdd72ca2 - jpountz - 2019-12-19 20:16:14]
        at org.apache.lucene.store.FilterDirectory.createOutput(FilterDirectory.java:74) ~[lucene-core-8.4.0.jar:8.4.0 bc02ab906445fcf4e297f4ef00ab4a54fdd72ca2 - jpountz - 2019-12-19 20:16:14]
        at org.elasticsearch.index.store.ByteSizeCachingDirectory.createOutput(ByteSizeCachingDirectory.java:130) ~[elasticsearch-7.6.2.jar:7.6.2]
        at org.apache.lucene.store.FilterDirectory.createOutput(FilterDirectory.java:74) ~[lucene-core-8.4.0.jar:8.4.0 bc02ab906445fcf4e297f4ef00ab4a54fdd72ca2 - jpountz - 2019-12-19 20:16:14]
        at org.apache.lucene.store.LockValidatingDirectoryWrapper.createOutput(LockValidatingDirectoryWrapper.java:44) ~[lucene-core-8.4.0.jar:8.4.0 bc02ab906445fcf4e297f4ef00ab4a54fdd72ca2 - jpountz - 2019-12-19 20:16:14]
        at org.apache.lucene.store.TrackingDirectoryWrapper.createOutput(TrackingDirectoryWrapper.java:43) ~[lucene-core-8.4.0.jar:8.4.0 bc02ab906445fcf4e297f4ef00ab4a54fdd72ca2 - jpountz - 2019-12-19 20:16:14]
        at org.apache.lucene.codecs.lucene80.Lucene80DocValuesConsumer.<init>(Lucene80DocValuesConsumer.java:70) ~[lucene-core-8.4.0.jar:8.4.0 bc02ab906445fcf4e297f4ef00ab4a54fdd72ca2 - jpountz - 2019-12-19 20:16:14]
        at org.apache.lucene.codecs.lucene80.Lucene80DocValuesFormat.fieldsConsumer(Lucene80DocValuesFormat.java:141) ~[lucene-core-8.4.0.jar:8.4.0 bc02ab906445fcf4e297f4ef00ab4a54fdd72ca2 - jpountz - 2019-12-19 20:16:14]
        at org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsWriter.getInstance(PerFieldDocValuesFormat.java:224) ~[lucene-core-8.4.0.jar:8.4.0 bc02ab906445fcf4e297f4ef00ab4a54fdd72ca2 - jpountz - 2019-12-19 20:16:14]
        at org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsWriter.getInstance(PerFieldDocValuesFormat.java:160) ~[lucene-core-8.4.0.jar:8.4.0 bc02ab906445fcf4e297f4ef00ab4a54fdd72ca2 - jpountz - 2019-12-19 20:16:14]
        at org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsWriter.addSortedSetField(PerFieldDocValuesFormat.java:129) ~[lucene-core-8.4.0.jar:8.4.0 bc02ab906445fcf4e297f4ef00ab4a54fdd72ca2 - jpountz - 2019-12-19 20:16:14]
        at org.apache.lucene.index.SortedSetDocValuesWriter.flush(SortedSetDocValuesWriter.java:221) ~[lucene-core-8.4.0.jar:8.4.0 bc02ab906445fcf4e297f4ef00ab4a54fdd72ca2 - jpountz - 2019-12-19 20:16:14]
        at org.apache.lucene.index.DefaultIndexingChain.writeDocValues(DefaultIndexingChain.java:263) ~[lucene-core-8.4.0.jar:8.4.0 bc02ab906445fcf4e297f4ef00ab4a54fdd72ca2 - jpountz - 2019-12-19 20:16:14]
        at org.apache.lucene.index.DefaultIndexingChain.flush(DefaultIndexingChain.java:138) ~[lucene-core-8.4.0.jar:8.4.0 bc02ab906445fcf4e297f4ef00ab4a54fdd72ca2 - jpountz - 2019-12-19 20:16:14]
        at org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:468) ~[lucene-core-8.4.0.jar:8.4.0 bc02ab906445fcf4e297f4ef00ab4a54fdd72ca2 - jpountz - 2019-12-19 20:16:14]
        at org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:555) ~[lucene-core-8.4.0.jar:8.4.0 bc02ab906445fcf4e297f4ef00ab4a54fdd72ca2 - jpountz - 2019-12-19 20:16:14]
        at org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:722) ~[lucene-core-8.4.0.jar:8.4.0 bc02ab906445fcf4e297f4ef00ab4a54fdd72ca2 - jpountz - 2019-12-19 20:16:14]
        at org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:3200) ~[lucene-core-8.4.0.jar:8.4.0 bc02ab906445fcf4e297f4ef00ab4a54fdd72ca2 - jpountz - 2019-12-19 20:16:14]
        at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:3445) ~[lucene-core-8.4.0.jar:8.4.0 bc02ab906445fcf4e297f4ef00ab4a54fdd72ca2 - jpountz - 2019-12-19 20:16:14]
        at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:3410) ~[lucene-core-8.4.0.jar:8.4.0 bc02ab906445fcf4e297f4ef00ab4a54fdd72ca2 - jpountz - 2019-12-19 20:16:14]
        at org.elasticsearch.index.engine.InternalEngine.commitIndexWriter(InternalEngine.java:2456) ~[elasticsearch-7.6.2.jar:7.6.2]
        at org.elasticsearch.index.engine.InternalEngine.recoverFromTranslogInternal(InternalEngine.java:493) ~[elasticsearch-7.6.2.jar:7.6.2]
        at org.elasticsearch.index.engine.InternalEngine.recoverFromTranslog(InternalEngine.java:453) ~[elasticsearch-7.6.2.jar:7.6.2]
        at org.elasticsearch.index.engine.InternalEngine.recoverFromTranslog(InternalEngine.java:131) ~[elasticsearch-7.6.2.jar:7.6.2]
        at org.elasticsearch.index.shard.IndexShard.recoverLocallyUpToGlobalCheckpoint(IndexShard.java:1445) ~[elasticsearch-7.6.2.jar:7.6.2]
        at org.elasticsearch.indices.recovery.PeerRecoveryTargetService.doRecovery(PeerRecoveryTargetService.java:178) ~[elasticsearch-7.6.2.jar:7.6.2]
        at org.elasticsearch.indices.recovery.PeerRecoveryTargetService.access$500(PeerRecoveryTargetService.java:79) ~[elasticsearch-7.6.2.jar:7.6.2]
        at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$RecoveryRunner.doRun(PeerRecoveryTargetService.java:563) ~[elasticsearch-7.6.2.jar:7.6.2]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:692) ~[elasticsearch-7.6.2.jar:7.6.2]
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-7.6.2.jar:7.6.2]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
        at java.lang.Thread.run(Thread.java:830) [?:?]
[2021-05-05T15:09:06,449][INFO ][o.e.c.m.MetaDataMappingService] [elk1] [testlog-2021.05.05/n6VhET9_SOWlRnVI73VKbQ] update_mapping [_doc]

It seems said Too many open files again... What should I do?

What does the output of GET _nodes/stats/process?filter_path=**.max_file_descriptors show?