First some setup information:
3 dedicated master nodes, 2 nodata (load balancing) nodes, several data boxes, which hold 6 ES nodes per box (30G heap per node, 4TB nvme disk per node, boxes have 386 or 512GB of RAM). Elasticsearch version is 6.8.2. Nodes are installed via offical deb package (and copy-pasta service files for starting different nodes)
openjdk version "1.8.0_222"
4.15.0-58-generic #64-Ubuntu
elasticsearch@hostname$ ulimit -u
2062033
Problem:
At pseudo-random times (once-twice a day), 1-2 nodes per box OOM with error as follows. There doesn't seem to be any substantial increase in load before the OOM.
Error is as follows:
[2019-09-18T10:27:47,847][WARN ][i.n.c.AbstractChannelHandlerContext] [hostname] An exception 'java.lang.OutOfMemoryError: unable to create new native thread' [enable DEBUG level for full stacktrace] was thrown by a user handler's exceptionCaught() method while handling the following exception:
java.lang.OutOfMemoryError: unable to create new native thread
The issue is probably something basic that I have missed, but I haven't been able to figure it out so far.
This is my first post, so there might not be enough information and so on :).