Hi Team,
I am trying to use official helm chart to build elasticsearch cluster in k8s cluster. I used same helm chart with CentOS atomic OS and successfully built elasticsearch cluster. However, when I am trying to use same helm chart with RHEL 7.5 Version, I noticed, elasticsearch pods go into Running-->OOMKIlled-->crashloopbackoff state. Not sure, what is wrong with the values.yaml file configuration. Same helm chart works fine with Centos atomic OS. Pls guide do i have to change any setting on RHEL 7.5 OS.
We have around 250GB of RAM on each worker node.
I am using elasticsearch 7.5.2 version and default JAVAOPTS.
image: "docker.elastic.co/elasticsearch/elasticsearch"
imageTag: "7.5.2"
imagePullPolicy: "IfNotPresent"
podAnnotations: {}
# iam.amazonaws.com/role: es-cluster
# additionals labels
labels: {}
esJavaOpts: "-Xmx1g -Xms1g"
I am not seeing any error when I execute dmesg
command on master nodes. I can see below message on worker nodes.
Kernel version
[root@cesiumk8s-elk1 ~]# uname -a
Linux cesiumk8s-elk1.xxx.com 3.10.0-862.el7.x86_64 #1 SMP Wed Mar 21 18:14:51 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux
[root@cesiumk8s-elk1 ~]# uname -r
3.10.0-862.el7.x86_64
[root@cesiumk8s-elk1 ~]#
[530710.804005] Memory cgroup out of memory: Kill process 56296 (java) score 1990 or sacrifice child
[530710.805416] Killed process 55956 (java) total-vm:2608264kB, anon-rss:2082960kB, file-rss:5824kB, shmem-rss:0kB
[530742.811881] java invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=993
[530742.811886] java cpuset=8697e9c6cba52723e94ab606d496a3e068b1d1e16be1c672d54d68110e5ab900 mems_allowed=0-1
[530742.811889] CPU: 29 PID: 479 Comm: java Kdump: loaded Tainted: G ------------ T 3.10.0-862.el7.x86_64 #1
[530742.811891] Hardware name: Cisco Systems Inc UCSC-C240-M4SX/UCSC-C240-M4SX, BIOS C240M4.4.0.1d.0.1005181458 10/05/2018
[530742.811892] Call Trace:
[530742.811901] [<ffffffff9010d768>] dump_stack+0x19/0x1b
[530742.811903] [<ffffffff901090ea>] dump_header+0x90/0x229
[530742.811910] [<ffffffff8fb97456>] ? find_lock_task_mm+0x56/0xc0
[530742.811914] [<ffffffff8fc0b1f8>] ? try_get_mem_cgroup_from_mm+0x28/0x60
[530742.811916] [<ffffffff8fb97904>] oom_kill_process+0x254/0x3d0
[530742.811919] [<ffffffff8fc0efe6>] mem_cgroup_oom_synchronize+0x546/0x570
[530742.811921] [<ffffffff8fc0e460>] ? mem_cgroup_charge_common+0xc0/0xc0
[530742.811924] [<ffffffff8fb98194>] pagefault_out_of_memory+0x14/0x90
[530742.811926] [<ffffffff9010720c>] mm_fault_error+0x6a/0x157
[530742.811929] [<ffffffff9011a886>] __do_page_fault+0x496/0x4f0
[530742.811931] [<ffffffff9011a915>] do_page_fault+0x35/0x90
[530742.811935] [<ffffffff90116768>] page_fault+0x28/0x30
[530742.811938] Task in /kubepods/burstable/pod24a8a7ea-b704-4542-943e-4bb11a0ff9df/8697e9c6cba52723e94ab606d496a3e068b1d1e16be1c672d54d68110e5ab900 killed as a result of limit of /kubepods/burstable/pod24a8a7ea-b704-4542-943e-4bb11a0ff9df
[530742.811941] memory: usage 2097152kB, limit 2097152kB, failcnt 3651
[530742.811942] memory+swap: usage 2097152kB, limit 9007199254740988kB, failcnt 0
[530742.811943] kmem: usage 15412kB, limit 9007199254740988kB, failcnt 0
[530742.811944] Memory cgroup stats for /kubepods/burstable/pod24a8a7ea-b704-4542-943e-4bb11a0ff9df: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB
[530742.811960] Memory cgroup stats for /kubepods/burstable/pod24a8a7ea-b704-4542-943e-4bb11a0ff9df/e64e6862ec7f1c2a40af2d5abe56719cc323b6a96e8c799779d399a33109617e: cache:0KB rss:40KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:40KB inactive_file:0KB active_file:0KB unevictable:0KB
[530742.811979] Memory cgroup stats for /kubepods/burstable/pod24a8a7ea-b704-4542-943e-4bb11a0ff9df/8697e9c6cba52723e94ab606d496a3e068b1d1e16be1c672d54d68110e5ab900: cache:40KB rss:2081660KB rss_huge:2066432KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:2081660KB inactive_file:40KB active_file:0KB unevictable:0KB
[530742.811992] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name
[530742.812260] [54433] 1000 54433 253 1 4 0 -998 pause
[530742.812266] [57057] 1000 57057 652066 521866 1052 0 993 java
[530742.812268] Memory cgroup out of memory: Kill process 479 (java) score 1989 or sacrifice child
[530742.813666] Killed process 57057 (java) total-vm:2608264kB, anon-rss:2081636kB, file-rss:5828kB, shmem-rss:0kB
Thanks,
Kasim Shaik