ELASCITSEARCH PROCESS DIES

Anderson_Oliveira · December 9, 2019, 2:23pm

I have an cluster with 5 nodes of elasticsearch, all nodes with same RAM 8GB, but elasticsearch process only stay up when i set heap with -Xms1g -Xmx1g, if i set -Xms4g -Xmx4g after some time elasticsearch dies, without load.

[291288.485572] Out of memory: Kill process 3655 (java) score 591 or sacrifice child
[291288.485756] Killed process 3754 (controller) total-vm:136444kB, anon-rss:564kB, file-rss:664kB, shmem-rss:0kB

I'm using ES 7.5 with x-pack basic license active.

DavidTurner · December 9, 2019, 3:58pm

The logs you quoted aren't from Elasticsearch dying: Killed process 3754 (controller) indicates this is the ML controller dying instead. But it's likely that Elasticsearch dies shortly after.

Can you share (a) the complete dmesg output and (b) the complete Elasticsearch logs from when it starts up to when it dies? Use https://gist.github.com/ since it'll be too much to share here.

Anderson_Oliveira · December 10, 2019, 5:49pm

https://gist.github.com/ardoliveira/c458898cbddc2d653522fbcd0cab1e7c this is the information you asked for

DavidTurner · December 10, 2019, 6:34pm

Thanks. You didn't include the complete Elasticsearch logs, but here's what the kernel says:

[13431.218487] Killed process 727 (java) total-vm:7047832kB, anon-rss:4638168kB, file-rss:182436kB, shmem-rss:0kB

I.e. Elasticsearch was using 4.4GB (anon-rss is the important figure) when the host ran out of memory. This is well within what I'd expect with -Xmx4g since Elasticsearch assumes you have set the heap size to no more than 50% of the available RAM.

I'm not sure what else is using the rest of the RAM, but it doesn't seem to be Elasticsearch.

Anderson_Oliveira · December 10, 2019, 7:16pm

so, i saw this, but this node has nothing running only elastic and OS process.

Elastic only stay up when i set heap like a 1GB , or ADD 16GB RAM for host and set 4GB heap to ES.

OBS: this is strange because this only occurs in my ES 7.5, in my another cluster with ES 6.5 i have a node with 16GB of RAM and set HEAP with 8GB.

DavidTurner · December 10, 2019, 9:41pm

There's definitely differences in the structure of memory usage between 6.x and 7.x that could account for the difference in behaviour you're seeing. But Elasticsearch is still using (much) less than the expected limit of 8GB of memory when it's killed.

There's other weirdness in the kernel logs too:

[13431.218115] kworker/1:1 invoked oom-killer: gfp_mask=0x6200c2(GFP_HIGHUSER), nodemask=(null), order=0, oom_score_adj=0

order=0 means the failed allocation is a single 4kB page, but ...

[13431.218193] Node 0 DMA: 1*4kB (U) 1*8kB (U) 1*16kB (U) 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15900kB
[13431.218201] Node 0 DMA32: 262*4kB (UME) 229*8kB (UME) 202*16kB (ME) 177*32kB (UME) 159*64kB (UME) 90*128kB (UME) 39*256kB (UME) 3*512kB (UM) 0*1024kB 0*2048kB 0*4096kB = 44992kB
[13431.218209] Node 0 Normal: 1791*4kB (MEH) 1171*8kB (UMEH) 703*16kB (UMEH) 281*32kB (UME) 81*64kB (UM) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 41956kB

... all areas have enough free space to satisfy that.

This StackOverflow answer is consistent with your log indicating free < min in the Normal zone ...

[13431.218188] Node 0 Normal free:41956kB min:42192kB low:52740kB high:63288kB active_anon:62984kB inactive_anon:8384kB active_file:76kB inactive_file:100kB unevictable:4651284kB writepending:16kB present:5242880kB managed:5085632kB mlocked:4651284kB kernel_stack:2576kB pagetables:13724kB bounce:0kB free_pcp:4kB local_pcp:4kB free_cma:0kB

... and indicates a known kernel bug that could cause this. What kernel are you using and is it affected?

Anderson_Oliveira · December 11, 2019, 11:26am

I'm using kernel 4.18.0-80.11.2.el8_0.x86_64 Centos 8

system · January 8, 2020, 11:26am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elasticsearch 5.1.1 keep dying after 20 minutes Elasticsearch	15	2965	February 6, 2017
Elasticsearch process is killed by OOM killer Elasticsearch	4	5989	March 3, 2020
Elasticsearch process on a node takes 99% of RAM as seen by top, and eventually gets killed by kernel Elasticsearch	1	763	July 6, 2017
Elasticsearch Process getting killed Elasticsearch	8	8422	December 3, 2019
OOM killer triggered and machine crashes after using up all memory Elasticsearch	10	4689	May 12, 2019

ELASCITSEARCH PROCESS DIES

Related topics