Out of memory (invoked oom-killer)

(Simen Flatby) #1

Elasticsearch version: 2.4.0
Plugins installed: [ "license", "marvel-agent" ]
JVM version: 1.8.0_101
OS version: Ubuntu 16.04.1 LTS
Kernel version: 4.4.0-67-generic #88-Ubuntu
RAM: 14 GB
JVM heap min and max settings: 8GB

Kibana version: 4.6.0
Plugins installed: [ "marvel", "sense" ]
Filebeat version: 5.2.2
nginx version: nginx/1.10.0 (Ubuntu)

Description of the problem including expected versus actual behavior:
This node has been running for months with normal load. The other day Filebeat was installed to ship system logs to another elasticsearch log cluster. After about 24 hour after the installation the server ran out of memory and from the syslog I can see that filebeat invoked oom-killer that in turn led to killing of the JVM and elasticsearch.
The server was rebooted and seemed to work fine for 1 1/2 hour. Then the same thing happened again, but this time it was java invoked oom-killer. Once again the server was rebootet and this time Filebeat was stopped. After some days Filebeat was started again, and the server has now been running stable for three days.

I do not know if this problem is caused by Filebeat or if it was a coincidence that it occurred after the installation.

Other things to note:

  • Nothing interesting in the elasticsearch logs or the Filebeat logs.
  • The other nodes in the cluster have been stable since the Filebeat installation. The only difference in their setup is that they all have kernel version 4.4.0-66-generic #87-Ubuntu.
  • When i look at the graph for system load in marvel at the time of the error the load went from an avg. of 1.5 to 200.

Steps to reproduce:

  1. Do not know how to reproduce since the error occurred after over 24 hours of uptime with normal load.

Syslog: https://github.com/elastic/elasticsearch/issues/23842

Elasticsearch 5.2.2 : Memory keeps on increasing steadily untill ES gets killed by System OOM Killer
(Simen Flatby) #2

This is now solved (for now).

The problem turned out to be Hyper-V spesific and occurred when Hyper-V did a snapshot or a live host migration.

It was caused by a bug in the kernel (https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1679898).

The solution is to remove the virtual DVD rooms from Hyper-V (https://technet.microsoft.com/en-us/itpro/powershell/windows/hyper-v/remove-vmdvddrive).

(system) #3

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.