OOM killer triggered and machine crashes after using up all memory


(Jalaluddeen Mohammed) #1

I have a 12 node cluster with 24G RAM running only ELasticsearch.
Elasticsearch is configured to use 11G of RAM as below.

-Xms11g
-Xmx11g

There is no indexing or querying happening on this cluster, this is a fresh deployment, basically sitting idle.
But after for a few days, machine crashes and I have to reboot the machine.
There is no Heap dump or any GC events in the elasticsearch logs.
THis has happened in 3 machines in the cluster now.

Upon checking dmesg it says
//Out of memory: Kill process 19590 (java) score 507 or sacrifice child

Below is the relevant dmesg output.

<>
Node 0 Normal free:45816kB min:59172kB low:73964kB high:88756kB active_anon:8kB inactive_anon:0kB active_file:544kB inactive_file:412kB unevictable:12650952kB isolated(anon):0kB isolated(file):128kB present:21719040kB mlocked:207080kB dir
ty:0kB writeback:0kB mapped:116920kB shmem:0kB slab_reclaimable:9872kB slab_unreclaimable:42008kB kernel_stack:5680kB pagetables:29844kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve: 0 0 0 0
Node 0 DMA: 14kB 18kB 216kB 132kB 264kB 0128kB 0256kB 0512kB 11024kB 12048kB 34096kB = 15564kB
Node 0 DMA32: 12
4kB 98kB 716kB 2932kB 1664kB 4128kB 6256kB 9512kB 81024kB 72048kB 154096kB = 92808kB
Node 0 Normal: 104334kB 238kB 016kB 032kB 064kB 0128kB 0256kB 0512kB 01024kB 02048kB 1*4096kB = 46012kB
29638 total pagecache pages
109 pages in swap cache
Swap cache stats: add 116875, delete 116766, find 4926/7159
Free swap = 631592kB
Total swap = 1042140kB
6291440 pages RAM
139727 pages reserved
35020 pages shared
6079389 pages non-shared
[ pid ] uid tgid total_vm rss cpu oom_adj oom_score_adj name
[ 638] 0 638 2731 69 3 -17 -1000 udevd
[ 1149] 0 1149 2730 69 1 -17 -1000 udevd
[ 1264] 0 1264 2730 66 7 -17 -1000 udevd
[ 1714] 0 1714 7441 132 5 -17 -1000 auditd
[ 1738] 0 1738 1540 121 0 0 0 portreserve
[ 1748] 0 1748 63919 217 1 0 0 rsyslogd
[ 1763] 0 1763 4586 104 2 0 0 irqbalance
[ 1785] 32 1785 4745 124 5 0 0 rpcbind
[ 1809] 29 1809 5838 180 6 0 0 rpc.statd
[ 1839] 0 1839 1671 92 2 0 0 vnstatd
[ 1850] 81 1850 5359 87 6 0 0 dbus-daemon
[ 1888] 0 1888 1019 131 6 0 0 acpid
[ 1900] 68 1900 9581 200 2 0 0 hald
[ 1901] 0 1901 5099 132 1 0 0 hald-runner
[ 1934] 0 1934 5629 119 2 0 0 hald-addon-inpu
[ 1948] 68 1948 4501 162 6 0 0 hald-addon-acpi
[ 2126] 0 2126 43169 249 1 0 0 vmtoolsd
[ 2179] 0 2179 96539 180 2 0 0 automount
[ 2311] 0 2311 7979 124 2 -17 -1000 sshd
[ 2324] 0 2324 5428 161 6 0 0 xinetd
[ 2335] 38 2335 7685 239 7 0 0 ntpd
[ 2481] 0 2481 20253 238 1 0 0 master
[ 2491] 89 2491 20316 238 7 0 0 qmgr
[ 2495] 0 2495 45773 202 3 0 0 abrtd
[ 2523] 0 2523 29221 148 5 0 0 crond
[ 2538] 0 2538 5276 68 4 0 0 atd
[ 2958] 0 2958 148548 232 0 0 0 salt-minion
[ 2959] 0 2959 123917 59 6 0 0 salt-minion
[ 2987] 0 2987 16119 84 0 0 0 certmonger
[ 3020] 0 3020 1015 115 3 0 0 mingetty
[ 3024] 0 3024 1015 115 3 0 0 mingetty
[ 3027] 0 3027 1015 115 7 0 0 mingetty
[ 3030] 0 3030 1015 115 1 0 0 mingetty
[ 3034] 0 3034 1015 115 4 0 0 mingetty
[ 3038] 0 3038 1015 115 3 0 0 mingetty
[19590] 61947 19590 4583234 3162869 1 0 0 java
[12304] 89 12304 20273 225 1 0 0 pickup
Out of memory: Kill process 19590 (java) score 507 or sacrifice child
Killed process 19590, UID 61947, (java) total-vm:18332936kB, anon-rss:12534556kB, file-rss:116920kB
[0]: VMCI: Updating context from (ID=0xea4a4c50) to (ID=0xea4a4c50) on event (type=0).

</>


(Christian Dahlqvist) #2

Elasticsearch used memory in addition to the heap you have configured, and it is generally recommended to give it the same amount of off-heap memory as you have heap. If you have configured the heap to be 11GB, Elasticsearch should ideally according to this rule of thumb have access to 22GB or memory in total. If you have other components or services running on this host it is possible that this is what is causing problems. I also see that you are using swap space, which is something that is not recommended with Elasticsearch.

It might also help if you were able to provide a bit more information about your environment, e.g. size of data and the version used.


(Jalaluddeen Mohammed) #3

Thanks!

I using Elasticsearch version 5.6.16 and have around 480G of data in total from 3 indices, 12 node cluster with 3.4TB total available on each node.

There are no other apps except for salt-minion on the nodes.
OS is Centos 6.10 .. Kernel - 2.6.32-754.10.1.el6.x86_64


(Christian Dahlqvist) #4

Is there anything in the Elasticsearch logs?


(David Turner) #5

This indicates that your off-heap memory is very fragmented. Do you have any nonstandard kernel settings? In particular, what is vm.overcommit_memory set to?


(Jalaluddeen Mohammed) #6

Hi,

/proc/sys/vm/overcommit_memory = 0

/proc/sys/vm/overcommit_ratio = 50

Kernel version 2.6.32-754.10.1.el6.x86_64

OS Centos 6.10

Below are the log entries from Elasticsearch around the time of crash.

[2019-04-12T03:16:50,736][WARN ][o.e.m.j.JvmGcMonitorService] [AAzI9z_] [gc][young][175648][31] duration [12.3s], collections [1]/[12.8s], total [12.3s]/[14.2s], memory [1gb]->[563.1mb]/[10.9gb], all_pools {[young] [524.1mb]->[5.5mb]/[532.5mb]}{[survivor] [237.1kb]->[51.9mb]/[66.5mb]}{[old] [505.6mb]->[505.6mb]/[10.3gb]}

[2019-04-12T03:16:50,738][WARN ][o.e.m.j.JvmGcMonitorService] [AAzI9z_] [gc][175648] overhead, spent [12.3s] collecting in the last [12.8s]

[2019-04-12T03:17:18,525][WARN ][o.e.m.j.JvmGcMonitorService] [AAzI9z_] [gc][young][175664][32] duration [12.1s], collections [1]/[12.6s], total [12.1s]/[26.3s], memory [1gb]->[657.5mb]/[10.9gb], all_pools {[young] [508mb]->[4.3mb]/[532.5mb]}{[survivor] [51.9mb]->[66.5mb]/[66.5mb]}{[old] [505.6mb]->[587mb]/[10.3gb]}

[2019-04-12T03:17:18,526][WARN ][o.e.m.j.JvmGcMonitorService] [AAzI9z_] [gc][175664] overhead, spent [12.1s] collecting in the last [12.6s]


(David Turner) #7

Thanks, ok, this is the default so I think that's not it.

This suggests that this node is under a lot of heap pressure, which is weird if the cluster is idle. Can you tell us a bit more about this cluster: for instance how many shards does it have?

I agree with the advice above that it'd be a good idea to disable swap for the reasons described in the manual. Also note that the 5.6 series is now past the end of its supported life so it could be tricky to dig into this in detail. Upgrading is recommended, although I don't know of any changes in 6.x that would address these specific symptoms.

Ideally I think I'd like to see a heap dump taken when a node goes OOM, but unfortunately we don't get a heap dump if a process is killed by the OOM killer. Are there GC warnings on other nodes like this, suggesting that other nodes are heading towards failure? If so, can you take a heap dump from such a node?


(Jalaluddeen Mohammed) #8

Thank you.

The cluster has 42 shards in total.

Unfortunately there are no heap dumps in any of the 12 nodes in the cluster.
And around the time of crash there are no GC warnings in the ES logs.


(David Turner) #9

Sure, you'll have to capture one by hand.

I'm confused, because previously you said:


(Jalaluddeen Mohammed) #10

What I meant is there are no GC warnings from other nodes, the above listed GC log is from the machine that crashed.