Elasticsearch nodes OOM at pseudo-random intervals

First some setup information:
3 dedicated master nodes, 2 nodata (load balancing) nodes, several data boxes, which hold 6 ES nodes per box (30G heap per node, 4TB nvme disk per node, boxes have 386 or 512GB of RAM). Elasticsearch version is 6.8.2. Nodes are installed via offical deb package (and copy-pasta service files for starting different nodes)

openjdk version "1.8.0_222"
4.15.0-58-generic #64-Ubuntu
elasticsearch@hostname$ ulimit -u
2062033

Problem:
At pseudo-random times (once-twice a day), 1-2 nodes per box OOM with error as follows. There doesn't seem to be any substantial increase in load before the OOM.

Error is as follows:
[2019-09-18T10:27:47,847][WARN ][i.n.c.AbstractChannelHandlerContext] [hostname] An exception 'java.lang.OutOfMemoryError: unable to create new native thread' [enable DEBUG level for full stacktrace] was thrown by a user handler's exceptionCaught() method while handling the following exception:
java.lang.OutOfMemoryError: unable to create new native thread

The issue is probably something basic that I have missed, but I haven't been able to figure it out so far.

This is my first post, so there might not be enough information and so on :).

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.