Too many open files in shards allocation

Dear Community,

I have the following issue since few days i have some unassigned shards due to allocation faillure, so the cluster goes to yellow and sometimes the cluster go to red as the elasticsearch service goes down.

I have the following architecture :

1 Master node
2 Data node (include the master)
2 Client Node

Total 4 Elasticsearch nodes.

1.7 k indices
1.9 b documents
2.9 TB of data
The configuration is the following on the data nodes : 5 Shards + 1 Complete Replica

[root@elastic-xx ~]# curl -X GET http://elastic-xx.domain.local:9200/_cluster/allocation/explain?pretty
{
  "index" : "winlogbeat-2018.08.20",
  "shard" : 2,
  "primary" : false,
  "current_state" : "unassigned",
  "unassigned_info" : {
    "reason" : "ALLOCATION_FAILED",
    "at" : "2018-08-20T18:59:53.330Z",
    "failed_allocation_attempts" : 5,
    "details" : "failed shard on node [uq3vVKvkRPKC94OOgfyrCA]: failed recovery, failure RecoveryFailedException[[winlogbeat-2018.08.20][2]: Recovery failed from {elastic-02}{QrlS-a6EThqug57OnWPdmg}{oOMM9zM_QESlyThKrefNGA}{xxx.xxx.xxx.xxx}{xxx.xxx.xxx.xxx:9300} into {elastic-01}{uq3vVKvkRPKC94OOgfyrCA}{2anUChAIQF-xHvw4kUzBRA}{xxx.xxx.xxx.xxx}{xxx.xxx.xxx.xxx:9300}]; nested: RemoteTransportException[[elastic-xx][xxx.xxx.xxx.xxx:9300][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[Phase[1] prepare target for translog failed]; nested: RemoteTransportException[[elastic-xx][xxx.xxx.xxx.xxx:9300][internal:index/shard/recovery/prepare_translog]]; nested: EngineCreationFailureException[failed to create engine]; nested: FileSystemException[/elk/elasticsearch/nodes/0/indices/8w4QBGSVTRq7jopP-n7L-w/2/translog/translog-3886.ckp: Too many open files]; ",
    "last_allocation_status" : "no_attempt" 
  },
  "can_allocate" : "no",
  "allocate_explanation" : "cannot allocate because allocation is not permitted to any of the nodes",
  "node_allocation_decisions" : [
    {
      "node_id" : "QrlS-a6EThqug57OnWPdmg",
      "node_name" : "elastic-xx",
      "transport_address" : "xxx.xxx.xxx.xxx:9300",
      "node_decision" : "no",
      "deciders" : [
        {
          "decider" : "max_retry",
          "decision" : "NO",
          "explanation" : "shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2018-08-20T18:59:53.330Z], failed_attempts[5], delayed=false, details[failed shard on node [uq3vVKvkRPKC94OOgfyrCA]: failed recovery, failure RecoveryFailedException[[winlogbeat-2018.08.20][2]: Recovery failed from {elastic-02}{QrlS-a6EThqug57OnWPdmg}{oOMM9zM_QESlyThKrefNGA}{10.10.68.24}{10.10.68.24:9300} into {elastic-01}{uq3vVKvkRPKC94OOgfyrCA}{2anUChAIQF-xHvw4kUzBRA}{xxx.xxx.xxx.xxx}{xxx.xxx.xxx.xxx:9300}]; nested: RemoteTransportException[[elastic-xx][xxx.xxx.xxx.xxx:9300][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[Phase[1] prepare target for translog failed]; nested: RemoteTransportException[[elastic-xx][xxx.xxx.xxx.xxx:9300][internal:index/shard/recovery/prepare_translog]]; nested: EngineCreationFailureException[failed to create engine]; nested: FileSystemException[/elk/elasticsearch/nodes/0/indices/8w4QBGSVTRq7jopP-n7L-w/2/translog/translog-3886.ckp: Too many open files]; ], allocation_status[no_attempt]]]" 
        },
        {
          "decider" : "same_shard",
          "decision" : "NO",
          "explanation" : "the shard cannot be allocated to the same node on which a copy of the shard already exists [[winlogbeat-2018.08.20][2], node[QrlS-a6EThqug57OnWPdmg], [P], s[STARTED], a[id=EFCtso3bSP2dDU9VbbDfOw]]" 
        }
      ]
    },
    {
      "node_id" : "uq3vVKvkRPKC94OOgfyrCA",
      "node_name" : "elastic-xx",
      "transport_address" : "xxx.xxx.xxx.xxx:9300",
      "node_decision" : "no",
      "deciders" : [
        {
          "decider" : "max_retry",
          "decision" : "NO",
          "explanation" : "shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2018-08-20T18:59:53.330Z], failed_attempts[5], delayed=false, details[failed shard on node [uq3vVKvkRPKC94OOgfyrCA]: failed recovery, failure RecoveryFailedException[[winlogbeat-2018.08.20][2]: Recovery failed from {elastic-02}{QrlS-a6EThqug57OnWPdmg}{oOMM9zM_QESlyThKrefNGA}{xxx.xxx.xxx.xxx}{xxx.xxx.xxx.xxx:9300} into {elastic-01}{uq3vVKvkRPKC94OOgfyrCA}{2anUChAIQF-xHvw4kUzBRA}{xxx.xxx.xxx.xxx}{xxx.xxx.xxx.xxx:9300}]; nested: RemoteTransportException[[elastic-xx][xxx.xxx.xxx.xxx:9300][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[Phase[1] prepare target for translog failed]; nested: RemoteTransportException[[elastic-01][xxx.xxx.xxx.xxx:9300][internal:index/shard/recovery/prepare_translog]]; nested: EngineCreationFailureException[failed to create engine]; nested: FileSystemException[/elk/elasticsearch/nodes/0/indices/8w4QBGSVTRq7jopP-n7L-w/2/translog/translog-3886.ckp: Too many open files]; ], allocation_status[no_attempt]]]" 
        }
      ]
    }
  ]
}

All the nodes in the cluster have the max_file_descriptors to 65536

{
    "nodes": {
        "DIyPbW4WQoSHFrtAYc0gmA": {
            "process": {
                "max_file_descriptors": 65536
            }
        },
        "QrlS-a6EThqug57OnWPdmg": {
            "process": {
                "max_file_descriptors": 65536
            }
        },
        "uq3vVKvkRPKC94OOgfyrCA": {
            "process": {
                "max_file_descriptors": 65536
            }
        },
        "zxSDt-cRTeOIQwNWzJgZWA": {
            "process": {
                "max_file_descriptors": 65536
            }
        }
    }
}

I thought addind data node , but i do not want to have more replica and consume much storage.

Do you have any recomendations on how to optimise and fix the issue ?

Thanks a lot for your help.

Best Regards, Edouard Fazenda.

You have far too many indices and shards given the size of your data and cluster. Please readthis blog post for some practical guidance around shards and sharding.

Thank you i will review this article and get back to you if i have more questions.

Ok I understand I have 8400 shards due to the 5 Shards + 1 Replication on 2 data nodes.

The 2 nodes have 24 GB of Heap for Elasticsearch

On the blog article it says that 25 Shards per GB of Heap, so it's about 25 * 24 GB = 600 Shards per nodes , as i have 2 nodes it should be 1200 Shards, I effectively far far to much shards.

If i am not wrong to respect the elasticsearch recomendation i need to have 14 data nodes of 24 GB of Heap

Or I can go in reducing the number of shards on the cluster via the srink API call
https://www.elastic.co/guide/en/elasticsearch/reference/6.3/indices-shrink-index.html

And go ahead with having a more strict retention.

Am i right ?

Thanks.

It looks like your average shard size is quite small, so I would recommend the following:

  • Change your index templates to have a single primary shard.
  • Consider switching from daily to perhaps weekly or monthly indices in order to get the average shard size up.
  • Delete data that is no longer needed and use the shrink API to shrink existing indices down to a single primary shard.
  • If this is not enough, you may need to reindex older data into indices covering a longer time period.

Ok thanks a lot for the recommendations !

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.