I have only one cluster and one node on which is Elasticsearch, Logstash and Kibana
This is a bitnami virtual appliance
The system was working fine until we had to reboot the machine then the Elasticsearch cannot startup normally again.
The open files are already 65000 and bitnami have this configuration already set
When check the cluster health I see this
curl -XGET 'localhost:9200/_cluster/health?pretty'
{
"cluster_name" : "elasticsearch",
"status" : "red",
"timed_out" : false,
"number_of_nodes" : 1,
"number_of_data_nodes" : 1,
"active_primary_shards" : 20,
"active_shards" : 20,
"relocating_shards" : 0,
"initializing_shards" : 4,
"unassigned_shards" : 17798,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 3,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 51,
"active_shards_percent_as_number" : 0.1122208506340478
}
When It reaches 8000 shards and 47%, it starts the open files errors
curl -XGET 'localhost:9200/_cluster/health?pretty'
{
"cluster_name" : "elasticsearch",
"status" : "red",
"timed_out" : false,
"number_of_nodes" : 1,
"number_of_data_nodes" : 1,
"active_primary_shards" : 8675,
"active_shards" : 8675,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 9147,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 7,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 436,
"active_shards_percent_as_number" : 48.675793962518235
}
The log is below
[2017-08-28T13:36:22,415][INFO ][o.e.n.Node ] [] initializing ...
[2017-08-28T13:36:22,524][INFO ][o.e.e.NodeEnvironment ] [dwVG3uu] using [1] data paths, mounts [[/opt (192.168.10.230:/services)]], net usable_space [28.1gb], net total_space [49.2gb], spins? [possibly], types [nfs]
[2017-08-28T13:36:22,524][INFO ][o.e.e.NodeEnvironment ] [dwVG3uu] heap size [7.9gb], compressed ordinary object pointers [true]
[2017-08-28T13:39:21,359][INFO ][o.e.n.Node ] node name [dwVG3uu] derived from node ID [dwVG3uuDRiGbSkTFqaiuyw]; set [node.name] to override
[2017-08-28T13:39:21,362][INFO ][o.e.n.Node ] version[5.2.2], pid[11603], build[f9d9b74/2017-02-24T17:26:45.835Z], OS[Linux/3.13.0-110-generic/amd64], JVM[Oracle Corporation/Java HotSpot(TM) 64-Bit Server VM/1.8.0_121/25.121-b13]
[2017-08-28T13:39:21,908][INFO ][o.e.p.PluginsService ] [dwVG3uu] loaded module [aggs-matrix-stats]
[2017-08-28T13:39:21,908][INFO ][o.e.p.PluginsService ] [dwVG3uu] loaded module [ingest-common]
[2017-08-28T13:39:21,908][INFO ][o.e.p.PluginsService ] [dwVG3uu] loaded module [lang-expression]
[2017-08-28T13:39:21,908][INFO ][o.e.p.PluginsService ] [dwVG3uu] loaded module [lang-groovy]
[2017-08-28T13:39:21,908][INFO ][o.e.p.PluginsService ] [dwVG3uu] loaded module [lang-mustache]
[2017-08-28T13:39:21,908][INFO ][o.e.p.PluginsService ] [dwVG3uu] loaded module [lang-painless]
[2017-08-28T13:39:21,908][INFO ][o.e.p.PluginsService ] [dwVG3uu] loaded module [percolator]
[2017-08-28T13:39:21,908][INFO ][o.e.p.PluginsService ] [dwVG3uu] loaded module [reindex]
[2017-08-28T13:39:21,908][INFO ][o.e.p.PluginsService ] [dwVG3uu] loaded module [transport-netty3]
[2017-08-28T13:39:21,908][INFO ][o.e.p.PluginsService ] [dwVG3uu] loaded module [transport-netty4]
[2017-08-28T13:39:21,909][INFO ][o.e.p.PluginsService ] [dwVG3uu] no plugins loaded
[2017-08-28T13:39:37,929][INFO ][o.e.n.Node ] initialized
[2017-08-28T13:39:37,929][INFO ][o.e.n.Node ] [dwVG3uu] starting ...
[2017-08-28T13:39:38,019][INFO ][o.e.t.TransportService ] [dwVG3uu] publish_address {192.168.10.137:9300}, bound_addresses {[::]:9300}
[2017-08-28T13:39:38,023][INFO ][o.e.b.BootstrapChecks ] [dwVG3uu] bound or publishing to a non-loopback or non-link-local address, enforcing bootstrap checks
[2017-08-28T13:39:41,057][INFO ][o.e.c.s.ClusterService ] [dwVG3uu] new_master {dwVG3uu}{dwVG3uuDRiGbSkTFqaiuyw}{ff8GSMa9SPWL3GQZlcrFIA}{192.168.10.137}{192.168.10.137:9300}, reason: zen-disco-elected-as-master ([0] nodes joined)
[2017-08-28T13:39:41,069][INFO ][o.e.h.HttpServer ] [dwVG3uu] publish_address {192.168.10.137:9200}, bound_addresses {[::]:9200}
[2017-08-28T13:39:41,070][INFO ][o.e.n.Node ] [dwVG3uu] started
[2017-08-28T13:40:16,010][INFO ][o.e.g.GatewayService ] [dwVG3uu] recovered [1783] indices into cluster_state
java.nio.file.NoSuchFileException: /opt/bitnami/elasticsearch/data/nodes/0/indices/-celQmnuQT2GYEVQtQmjLg/2/translog/translog-5008849271234023059.tlog
Caused by: org.elasticsearch.index.shard.IndexShardRecoveryException: failed to fetch index version after copying it over
Caused by: org.elasticsearch.index.shard.IndexShardRecoveryException: shard allocated for local recovery (post api), should exist, but doesn't, current files: _unknown_ (failure=FileSystemException[/opt/bitnami/elasticsearch/data/nodes/0/indices/-celQmnuQT2GYEVQtQmjLg/0/index: Too many open files in system])
Caused by: java.nio.file.FileSystemException: /opt/bitnami/elasticsearch/data/nodes/0/indices/-celQmnuQT2GYEVQtQmjLg/0/index: Too many open files in system