Elastic to many open files

Hello Team

I am facing some serious problems in elastic. I am getting below error in elastic logs

/opt/SOME_DIRECTORY_NAME/elasticsearch/data/nodes/0/indices/7QRpS9mnSwCHPdBwaLUJag/3/translog/translog-53.ckp: Too many open files

Due to the above error, Elastic is not getting initialized .

With lsof command more the 40000000 files are getting opened.
And when I stop the elastic process only 2k to 4k files are opened.

When elastic process is launched many of the files are getting opened multiple times
for eg with lsof command below file is getting opened more then 60
to 70 time
/opt/SOME_DIRECTORY_NAME/elasticsearch/data/nodes/0/indices/zmE65g1QRcCJ6rQIp97w8g/0/translog/translog-107.ckp

This is happening for all the files under this directory:
/opt/SOME_DIRECTORY_NAME/elasticsearch/data/nodes/0/indices/

ulimit -a output

core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 63457
max locked memory (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files (-n) 1000000
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 4096
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited

one strange thing which I have noticed is more no of the file are created in the below directory:
/opt/SOME_DIRECTORY_NAME/elasticsearch/data/nodes/0/
in this directory, all elastic index data is stored.

More then 400000 file are created in this directory
find /opt/SOME_DIRECTORY_NAME/elasticsearch/data/nodes/0/ -type f| wc -l
400000

lsof isn't a reliable way to determine the number of open files: in a multi-threaded application like Elasticsearch it shows an entry per open file per thread which multiplies the true number by a large factor. The actual figure is available from GET _nodes/stats/fs.

Can you share the output of GET _cluster/health for your cluster please?

hello sir please find the output for the `GET _cluster/health command
{
"cluster_name" : "data-cluster",
"status" : "yellow",
"timed_out" : false,
"number_of_nodes" : 1,
"number_of_data_nodes" : 1,
"active_primary_shards" : 2723,
"active_shards" : 2723,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 2720,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 50.02755833180231
}

That's far too many shards for a single node. The manual contains instructions for reducing your shard count.

1 Like

Thx @DavidTurner
But now I am trying to recover old data. And there are more the 2000 index created.
Now ,I have set no_of_shard=1 and no_of_replicas=0
and I am trying to reindex all the data with reindexing using the below script.

#!/bin/bash
variable=`curl -k -XGET -u username:password 
https://localhost:9200/_cat/indices/akshay* | awk '{print $3}'`
echo $variable
for i in $variable
do
    action=`curl -XPOST -u username:password "https://localhost:9200/_reindex?pretty" -H 'Content-Type: application/json' -d"{\"source\":{\"index\": \"${i}\"},\"dest\": {\"index\": \"${i}_new\"}}"`

    curl  -XDELETE -u username:password https://localhost:9200/${i} --insecure

    action1=`curl  -XPOST -u username:password "https://localhost:9200/_reindex?pretty" -H 'Content-Type: application/json' -d"{\"source\":{\"index\": \"${i}_new\"},\"dest\": {\"index\": \"${i}\"}}"`

    curl  -XDELETE -u username:password https://localhost:9200/${i}_new

done

The script is working as expected when the index size is big but when index size is small all the index with small size are getting deleted

getting below error for

action1=`curl  -XPOST -u username:password "https://localhost:9200/_reindex?pretty" -H 'Content-Type: application/json' -d"{\"source\":{\"index\": \"${i}_new\"},\"dest\": {\"index\": \"${i}\"}}"`

{"error":{"root_cause":[{"type":"index_not_found_exception","reason":"no such index","resource.type":"index_or_alias","resource.id":"akshay-dindure1_n_new","index_uuid":"na","index":"akshay-dindure1_n_new"}],"type":"index_not_found_exception","reason":"no such index","resource.type":"index_or_alias","resource.id":"akshay-dindure1_n_new","index_uuid":"na","index":"akshay-dindure1_n_new"},"status":404}

Seems pretty risky to delete the old index without checking the reindex process succeeded first.