ELASCITSEARCH PROCESS DIES

I have an cluster with 5 nodes of elasticsearch, all nodes with same RAM 8GB, but elasticsearch process only stay up when i set heap with -Xms1g -Xmx1g, if i set -Xms4g -Xmx4g after some time elasticsearch dies, without load.

[291288.485572] Out of memory: Kill process 3655 (java) score 591 or sacrifice child
[291288.485756] Killed process 3754 (controller) total-vm:136444kB, anon-rss:564kB, file-rss:664kB, shmem-rss:0kB

I'm using ES 7.5 with x-pack basic license active.

The logs you quoted aren't from Elasticsearch dying: Killed process 3754 (controller) indicates this is the ML controller dying instead. But it's likely that Elasticsearch dies shortly after.

Can you share (a) the complete dmesg output and (b) the complete Elasticsearch logs from when it starts up to when it dies? Use https://gist.github.com/ since it'll be too much to share here.

https://gist.github.com/ardoliveira/c458898cbddc2d653522fbcd0cab1e7c this is the information you asked for

Thanks. You didn't include the complete Elasticsearch logs, but here's what the kernel says:

[13431.218487] Killed process 727 (java) total-vm:7047832kB, anon-rss:4638168kB, file-rss:182436kB, shmem-rss:0kB

I.e. Elasticsearch was using 4.4GB (anon-rss is the important figure) when the host ran out of memory. This is well within what I'd expect with -Xmx4g since Elasticsearch assumes you have set the heap size to no more than 50% of the available RAM.

I'm not sure what else is using the rest of the RAM, but it doesn't seem to be Elasticsearch.

so, i saw this, but this node has nothing running only elastic and OS process.

Elastic only stay up when i set heap like a 1GB , or ADD 16GB RAM for host and set 4GB heap to ES.

OBS: this is strange because this only occurs in my ES 7.5, in my another cluster with ES 6.5 i have a node with 16GB of RAM and set HEAP with 8GB.

There's definitely differences in the structure of memory usage between 6.x and 7.x that could account for the difference in behaviour you're seeing. But Elasticsearch is still using (much) less than the expected limit of 8GB of memory when it's killed.

There's other weirdness in the kernel logs too:

[13431.218115] kworker/1:1 invoked oom-killer: gfp_mask=0x6200c2(GFP_HIGHUSER), nodemask=(null), order=0, oom_score_adj=0

order=0 means the failed allocation is a single 4kB page, but ...

[13431.218193] Node 0 DMA: 1*4kB (U) 1*8kB (U) 1*16kB (U) 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15900kB
[13431.218201] Node 0 DMA32: 262*4kB (UME) 229*8kB (UME) 202*16kB (ME) 177*32kB (UME) 159*64kB (UME) 90*128kB (UME) 39*256kB (UME) 3*512kB (UM) 0*1024kB 0*2048kB 0*4096kB = 44992kB
[13431.218209] Node 0 Normal: 1791*4kB (MEH) 1171*8kB (UMEH) 703*16kB (UMEH) 281*32kB (UME) 81*64kB (UM) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 41956kB

... all areas have enough free space to satisfy that.

This StackOverflow answer is consistent with your log indicating free < min in the Normal zone ...

[13431.218188] Node 0 Normal free:41956kB min:42192kB low:52740kB high:63288kB active_anon:62984kB inactive_anon:8384kB active_file:76kB inactive_file:100kB unevictable:4651284kB writepending:16kB present:5242880kB managed:5085632kB mlocked:4651284kB kernel_stack:2576kB pagetables:13724kB bounce:0kB free_pcp:4kB local_pcp:4kB free_cma:0kB

... and indicates a known kernel bug that could cause this. What kernel are you using and is it affected?

I'm using kernel 4.18.0-80.11.2.el8_0.x86_64 Centos 8

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.