Elasticsearch 5.6.10 on instantly killed

dumkaz · July 19, 2018, 12:20pm

Hi
I use a dedicated elastic cluster, consist of 9 nodes: 3 dedicated master nodes and 6 dedicated data nodes.
The cluster used as a backend to collect logs from all company production servers. Log traffic volume is about 160 Gb/day or 3000-6000 requests per second. Database nodes properties:
Hardware: i3.2xlarge ec2 instance ( 144 Gb RAM, 2X1900 GB NVME SSD disks in stripped LVM raid)
Java Heap Size set to 28 Gb.
Elasticsearch configured to memory lock on bootstrap and machines configured not to use swap volume at all. (no swap partition, VM.swapiness = 1 )
OS: Ubuntu 16.04 x64

I noticed Elastic service on different nodes get killed (probably by OOM)

from kern.log:
Jul 19 02:37:59 awses-dbnode1 kernel: [753817.897047] Out of memory: Kill process 67520 (java) score 526 or sacrifice child
Jul 19 02:37:59 awses-dbnode1 kernel: [753817.901374] Killed process 67520 (java) total-vm:1290332684kB, anon-rss:30413004kB, file-rss:35344396kB

could it be the result of lack of swap usage ?
Is it safe to configure systemd to restart elasticsearch service automatically in case of kill/crash ?

d-niet123 · July 19, 2018, 3:26pm

Can you run the free command on the server to see what the memory usage is? It seems like Elasticsearch can't get 28GB of memory to start up.

dumkaz · July 19, 2018, 4:09pm

Hi.

No issue with starting elastic. Server has 144 Gb of RAM.
Process of elastic is getting killed after ,let's say, a week or something like that.

d-niet123 · July 19, 2018, 5:00pm

This seems like a memory leak, can you post the JVM settings?

dumkaz · July 22, 2018, 8:49am

Sure. Here it is:
-Xms28000m
-Xmx28000m
-XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly
-XX:+AlwaysPreTouch
-server
-Xss1m
-Djava.awt.headless=true
-Dfile.encoding=UTF-8
-Djna.nosys=true
-Djdk.io.permissionsUseCanonicalPath=true
-Dio.netty.noUnsafe=true
-Dio.netty.noKeySetOptimization=true
-Dio.netty.recycler.maxCapacityPerThread=0
-Dlog4j.shutdownHookEnabled=false
-Dlog4j2.disable.jmx=true
-Dlog4j.skipJansi=true
-XX:+HeapDumpOnOutOfMemoryError

dumkaz · July 28, 2018, 9:42am

It seems like issue has been resolved by adding swap space to machines, though still leaving vm.swapiness=1. I see very little swap usage on all nodes, but OOM killer is not summoned anymore.
I don`t observe performance degrade in elasticsearch cluster

dumkaz · July 30, 2018, 10:49am

Nope. Yesterday still got one node killed by OOM. It is something to play with Java heap settings or kernel parameters tuning. I also have upgraded to kernel aws-1063 today.
May be someone can direct me which settings of Java/Kernel to adjust to get rid of OOM killer ?

system · August 27, 2018, 11:01am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
ELASCITSEARCH PROCESS DIES Elasticsearch	7	2240	January 8, 2020
Elasticsearch crashing - out of memory Elasticsearch	2	715	April 11, 2017
Elasticsearch 5.2.2 : Memory keeps on increasing steadily untill ES gets killed by System OOM Killer Elasticsearch	4	1219	June 12, 2017
Out of memory: Kill process Elasticsearch	5	15145	December 13, 2016
Elasticsearch 5.2.2 - Out of memory - while indexing Elasticsearch	6	4822	April 13, 2017

Elasticsearch 5.6.10 on instantly killed

Related topics