Too many open files on bitnami virtual appliance

mkorayem · August 28, 2017, 1:02pm

I have only one cluster and one node on which is Elasticsearch, Logstash and Kibana
This is a bitnami virtual appliance

The system was working fine until we had to reboot the machine then the Elasticsearch cannot startup normally again.

The open files are already 65000 and bitnami have this configuration already set

When check the cluster health I see this

curl -XGET 'localhost:9200/_cluster/health?pretty'
{
"cluster_name" : "elasticsearch",
"status" : "red",
"timed_out" : false,
"number_of_nodes" : 1,
"number_of_data_nodes" : 1,
"active_primary_shards" : 20,
"active_shards" : 20,
"relocating_shards" : 0,
"initializing_shards" : 4,
"unassigned_shards" : 17798,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 3,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 51,
"active_shards_percent_as_number" : 0.1122208506340478
}

When It reaches 8000 shards and 47%, it starts the open files errors

curl -XGET 'localhost:9200/_cluster/health?pretty'
{
"cluster_name" : "elasticsearch",
"status" : "red",
"timed_out" : false,
"number_of_nodes" : 1,
"number_of_data_nodes" : 1,
"active_primary_shards" : 8675,
"active_shards" : 8675,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 9147,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 7,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 436,
"active_shards_percent_as_number" : 48.675793962518235
}

The log is below

[2017-08-28T13:36:22,415][INFO ][o.e.n.Node               ] [] initializing ...
[2017-08-28T13:36:22,524][INFO ][o.e.e.NodeEnvironment    ] [dwVG3uu] using [1] data paths, mounts [[/opt (192.168.10.230:/services)]], net usable_space [28.1gb], net total_space [49.2gb], spins? [possibly], types [nfs]
[2017-08-28T13:36:22,524][INFO ][o.e.e.NodeEnvironment    ] [dwVG3uu] heap size [7.9gb], compressed ordinary object pointers [true]
[2017-08-28T13:39:21,359][INFO ][o.e.n.Node               ] node name [dwVG3uu] derived from node ID [dwVG3uuDRiGbSkTFqaiuyw]; set [node.name] to override
[2017-08-28T13:39:21,362][INFO ][o.e.n.Node               ] version[5.2.2], pid[11603], build[f9d9b74/2017-02-24T17:26:45.835Z], OS[Linux/3.13.0-110-generic/amd64], JVM[Oracle Corporation/Java HotSpot(TM) 64-Bit Server VM/1.8.0_121/25.121-b13]
[2017-08-28T13:39:21,908][INFO ][o.e.p.PluginsService     ] [dwVG3uu] loaded module [aggs-matrix-stats]
[2017-08-28T13:39:21,908][INFO ][o.e.p.PluginsService     ] [dwVG3uu] loaded module [ingest-common]
[2017-08-28T13:39:21,908][INFO ][o.e.p.PluginsService     ] [dwVG3uu] loaded module [lang-expression]
[2017-08-28T13:39:21,908][INFO ][o.e.p.PluginsService     ] [dwVG3uu] loaded module [lang-groovy]
[2017-08-28T13:39:21,908][INFO ][o.e.p.PluginsService     ] [dwVG3uu] loaded module [lang-mustache]
[2017-08-28T13:39:21,908][INFO ][o.e.p.PluginsService     ] [dwVG3uu] loaded module [lang-painless]
[2017-08-28T13:39:21,908][INFO ][o.e.p.PluginsService     ] [dwVG3uu] loaded module [percolator]
[2017-08-28T13:39:21,908][INFO ][o.e.p.PluginsService     ] [dwVG3uu] loaded module [reindex]
[2017-08-28T13:39:21,908][INFO ][o.e.p.PluginsService     ] [dwVG3uu] loaded module [transport-netty3]
[2017-08-28T13:39:21,908][INFO ][o.e.p.PluginsService     ] [dwVG3uu] loaded module [transport-netty4]
[2017-08-28T13:39:21,909][INFO ][o.e.p.PluginsService     ] [dwVG3uu] no plugins loaded
[2017-08-28T13:39:37,929][INFO ][o.e.n.Node               ] initialized
[2017-08-28T13:39:37,929][INFO ][o.e.n.Node               ] [dwVG3uu] starting ...
[2017-08-28T13:39:38,019][INFO ][o.e.t.TransportService   ] [dwVG3uu] publish_address {192.168.10.137:9300}, bound_addresses {[::]:9300}
[2017-08-28T13:39:38,023][INFO ][o.e.b.BootstrapChecks    ] [dwVG3uu] bound or publishing to a non-loopback or non-link-local address, enforcing bootstrap checks
[2017-08-28T13:39:41,057][INFO ][o.e.c.s.ClusterService   ] [dwVG3uu] new_master {dwVG3uu}{dwVG3uuDRiGbSkTFqaiuyw}{ff8GSMa9SPWL3GQZlcrFIA}{192.168.10.137}{192.168.10.137:9300}, reason: zen-disco-elected-as-master ([0] nodes joined)
[2017-08-28T13:39:41,069][INFO ][o.e.h.HttpServer         ] [dwVG3uu] publish_address {192.168.10.137:9200}, bound_addresses {[::]:9200}
[2017-08-28T13:39:41,070][INFO ][o.e.n.Node               ] [dwVG3uu] started
[2017-08-28T13:40:16,010][INFO ][o.e.g.GatewayService     ] [dwVG3uu] recovered [1783] indices into cluster_state

java.nio.file.NoSuchFileException: /opt/bitnami/elasticsearch/data/nodes/0/indices/-celQmnuQT2GYEVQtQmjLg/2/translog/translog-5008849271234023059.tlog
Caused by: org.elasticsearch.index.shard.IndexShardRecoveryException: failed to fetch index version after copying it over
Caused by: org.elasticsearch.index.shard.IndexShardRecoveryException: shard allocated for local recovery (post api), should exist, but doesn't, current files: _unknown_ (failure=FileSystemException[/opt/bitnami/elasticsearch/data/nodes/0/indices/-celQmnuQT2GYEVQtQmjLg/0/index: Too many open files in system])          
Caused by: java.nio.file.FileSystemException: /opt/bitnami/elasticsearch/data/nodes/0/indices/-celQmnuQT2GYEVQtQmjLg/0/index: Too many open files in system

Christian_Dahlqvist · August 28, 2017, 6:24pm

You have far too many shards for a single node. Each shard has some overhead in terms of memory and file handles, so the volume of shards you have will tie up a lot of resources unnecessarily.

I would suggest reducing the number of shards by at least one order of magnitude. You can do this by:

Deleting indices
Reindex data into consolidated indices with far fewer shards using the reindex API
Use the shrink index API

If the node is not in a state where you can perform operations, you may need to increase limits and possibly even scale the cluster up/out before you can address the shard count.

mkorayem · August 29, 2017, 8:52am

Thank you

Can I do this before it starts (offline mode) as it does not start or do I have to delete the indices first then start the elasticsearch up then make the reindex and shrink.

system · September 26, 2017, 8:53am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.