We've had the same max-open-files setting for our ES clusters for a while (65k) and our nodes are pretty large (i3.2xl's w/ ~1-1.5TB of data on each one). We've never had a problem with file handles on our older ES 2.4 cluster.
On our new ES 6.2.3 cluster though, we noticed that some nodes were crashing randomly.. digging in, here's the odd graph of open file handles:
You can see that every once in a while, it seems that a node will get stuck and start leaking file handles or something. They crash @ 65k, then go back to operating OK.
Any thoughts on what to look for? I am bumping up the file handle limits to see if these things plateau on their own or not.
This is a known issue due to an endless flush bug where a shard will repeatedly flush and open a new translog generation. This issue is addressed in 6.2.4; you should no longer see this issue if you upgrade.
Thanks for that - we'll do an upgrade and monitor from there!
We completed the migration.. haven't seen the issue on the new nodes, so thanks. We are having a Kibana issue now with 6.2.4 on the Monitoring UI - I opened a separate thread in the Kibana forum about that. Thanks for the quick reply though.
You are welcome. I am glad the translog issue is resolved for you.
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.