@ALIT What do you have set on your machine for the value of vm.max_map_count
if you run sysctl -a
?
We've always had this set to 262144
as per Elastic's recommendation
I set this to an artificially lower value in our testing environment and waited for this value to be reached by the Elastic process; the process exited with the exact same error I've been seeing.
I've now doubled this value on our clusters to see if it prevents, or delays, the OOM's we've been seeing. I've already observed that the number of memory regions being used on some of our hot nodes is already greater than the previous limit, so I'm more confident this is the source of the problem. Whether things will just grow to the next limit or not I don't know.
I would be interested if you're able to observe the same in your cluster. You can do wc -l /proc/<PID>/maps
to see the current number in use