We've had the same max-open-files setting for our ES clusters for a while (65k) and our nodes are pretty large (i3.2xl's w/ ~1-1.5TB of data on each one). We've never had a problem with file handles on our older ES 2.4 cluster.
On our new ES 6.2.3 cluster though, we noticed that some nodes were crashing randomly.. digging in, here's the odd graph of open file handles:
You can see that every once in a while, it seems that a node will get stuck and start leaking file handles or something. They crash @ 65k, then go back to operating OK.
Any thoughts on what to look for? I am bumping up the file handle limits to see if these things plateau on their own or not.
This is a known issue due to an endless flush bug where a shard will repeatedly flush and open a new translog generation. This issue is addressed in 6.2.4; you should no longer see this issue if you upgrade.
We completed the migration.. haven't seen the issue on the new nodes, so thanks. We are having a Kibana issue now with 6.2.4 on the Monitoring UI - I opened a separate thread in the Kibana forum about that. Thanks for the quick reply though.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.