Logstash memory leak on RPi

Hi, my Logstash (7.6.2) instance have some memory leak.
Xms and Xms are set on 512m, I have 1 very simple pipeline rest->simple map->elastic - this pipeline is executed few times in hour.
Logstash after up starting to increase used memory. After 5-6h logstash exhausts all my physical memory (4GB+0,5GB swap), thread is killed and logstash automatically starting again.
VisualVM looks good


Logstash working on RPi4 with 4GB of ram on adoptopenjdk-8-hotspot-armhf and Raspbian Buster.
GC logs looks also good, only in logs on debug I have same records every few seconds:

[DEBUG][logstash.instrument.periodicpoller.cgroup.cpuresource] File /sys/fs/cgroup/cpu/cpu.cfs_period_us cannot be found, try providing an override 'ls.cgroup.cpu.path.override' in the Logstash JAVA_OPTS environment variable
[DEBUG][logstash.instrument.periodicpoller.cgroup.cpuresource] File /sys/fs/cgroup/cpu/cpu.cfs_quota_us cannot be found, try providing an override 'ls.cgroup.cpu.path.override' in the Logstash JAVA_OPTS environment variable
[DEBUG][logstash.instrument.periodicpoller.cgroup.cpuresource] File /sys/fs/cgroup/cpu/cpu.stat cannot be found, try providing an override 'ls.cgroup.cpu.path.override' in the Logstash JAVA_OPTS environment variable
[DEBUG][logstash.instrument.periodicpoller.jvm] collector name {:name=>"ParNew"}
[DEBUG][logstash.instrument.periodicpoller.jvm] collector name {:name=>"ConcurrentMarkSweep"}
[DEBUG][org.logstash.execution.PeriodicFlush][mailgun] Pushing flush onto pipeline.

When memory ends I get dump in my syslog:

kernel: [137863.046945] Hardware name: Raspberry Pi 4 Model B Rev 1.2 (DT)
kernel: [137863.046949] Call trace:
kernel: [137863.046958]  dump_backtrace+0x0/0x178
kernel: [137863.046961]  show_stack+0x24/0x30
kernel: [137863.046968]  dump_stack+0xa4/0xdc
kernel: [137863.046974]  dump_header+0x64/0x248
kernel: [137863.046978]  oom_kill_process+0x320/0x370
kernel: [137863.046982]  out_of_memory+0xf8/0x308
kernel: [137863.046985]  __alloc_pages_nodemask+0xda0/0xe38
kernel: [137863.046989]  filemap_fault+0x430/0x590
kernel: [137863.046995]  ext4_filemap_fault+0x38/0x58
kernel: [137863.047000]  __do_fault+0x58/0x120
kernel: [137863.047003]  __handle_mm_fault+0x800/0xb28
kernel: [137863.047006]  handle_mm_fault+0x130/0x238
kernel: [137863.047011]  do_page_fault+0x1c0/0x498
kernel: [137863.047015]  do_translation_fault+0xa8/0xb8
kernel: [137863.047019]  do_mem_abort+0x68/0x128
kernel: [137863.047022]  el0_da+0x20/0x24
kernel: [137863.047025] Mem-Info:
kernel: [137863.047034] active_anon:649585 inactive_anon:259500 isolated_anon:0
kernel: [137863.047034]  active_file:523 inactive_file:1122 isolated_file:71
kernel: [137863.047034]  unevictable:2068 dirty:0 writeback:2 unstable:0
kernel: [137863.047034]  slab_reclaimable:16240 slab_unreclaimable:18722
kernel: [137863.047034]  mapped:3258 shmem:16120 pagetables:3817 bounce:0
kernel: [137863.047034]  free:6904 free_pcp:0 free_cma:963
kernel: [137863.047042] Node 0 active_anon:2598340kB inactive_anon:1038000kB active_file:2092kB inactive_file:4488kB unevictable:8272kB isolated(anon):0kB isolated(file):284kB mapped:13032kB dirty:0kB writeback:8kB shmem:64480kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
kernel: [137863.047050] DMA32 free:15340kB min:3744kB low:4680kB high:5616kB active_anon:739272kB inactive_anon:66816kB active_file:860kB inactive_file:3052kB unevictable:4368kB writepending:4kB present:970752kB managed:891816kB mlocked:4368kB kernel_stack:464kB pagetables:1668kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:3852kB
kernel: [137863.047053] lowmem_reserve[]: 0 2936 2936
kernel: [137863.047066] Normal free:12276kB min:12636kB low:15792kB high:18948kB active_anon:1859068kB inactive_anon:971184kB active_file:956kB inactive_file:1564kB unevictable:3904kB writepending:4kB present:3080192kB managed:3007256kB mlocked:3904kB kernel_stack:5392kB pagetables:13600kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
kernel: [137863.047069] lowmem_reserve[]: 0 0 0
kernel: [137863.047076] DMA32: 554*4kB (UMEC) 427*8kB (UMEC) 172*16kB (UMEC) 58*32kB (UMEC) 22*64kB (UMEC) 20*128kB (UEC) 4*256kB (UE) 1*512kB (C) 0*1024kB 0*2048kB 0*4096kB = 15744kB
kernel: [137863.047103] Normal: 1527*4kB (UME) 376*8kB (UME) 48*16kB (UME) 19*32kB (UME) 2*64kB (E) 1*128kB (U) 4*256kB (U) 1*512kB (U) 0*1024kB 0*2048kB 0*4096kB = 12284kB
kernel: [137863.047130] 25108 total pagecache pages
kernel: [137863.047134] 6720 pages in swap cache
kernel: [137863.047138] Swap cache stats: add 162998, delete 156275, find 168470/174626
kernel: [137863.047141] Free swap  = 0kB
kernel: [137863.047144] Total swap = 524284kB
kernel: [137863.047147] 1012736 pages RAM
kernel: [137863.047150] 0 pages HighMem/MovableOnly
kernel: [137863.047153] 37968 pages reserved
kernel: [137863.047156] 16384 pages cma reserved
kernel: [137863.047159] Tasks state (memory values in pages):
kernel: [137863.047162] [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
...
kernel: [137863.047382] [   3661]   118  3661   279991   141917  2224128   105626             0 java
...
kernel: [137863.047453] [  27766]   999 27766   777304   656494  5505024        3             0 java
...
kernel: [137863.047847] Out of memory: Kill process 27766 (java) score 594 or sacrifice child
kernel: [137863.047925] Killed process 27766 (java) total-vm:3109216kB, anon-rss:2625976kB, file-rss:0kB, shmem-rss:0kB
kernel: [137863.366369] oom_reaper: reaped process 27766 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
systemd-udevd[153]: 00-3a5000000000: Worker [8754] processing SEQNUM=4196 killed
systemd-udevd[153]: Worker [8754] terminated by signal 9 (KILL)
systemd-udevd[153]: 00-3a5000000000: Worker [8754] failed
systemd-udevd[153]: 00-ba5000000000: Worker [8781] processing SEQNUM=4197 is taking a long time
systemd-udevd[153]: 00-7a5000000000: Worker [8783] processing SEQNUM=4198 is taking a long time
systemd-udevd[153]: 00-065000000000: Worker [8799] processing SEQNUM=4200 is taking a long time
systemd-udevd[153]: 00-fa5000000000: Worker [8794] processing SEQNUM=4199 is taking a long time
systemd-udevd[153]: 00-fa5000000000: Worker [8794] processing SEQNUM=4199 killed
CRON[8795]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
systemd-udevd[153]: Worker [8794] terminated by signal 9 (KILL)
dbus-daemon[363]: [system] Connection has not authenticated soon enough, closing it (auth_timeout=30000ms, elapsed: 44391ms)
systemd-udevd[153]: 00-fa5000000000: Worker [8794] failed
systemd-udevd[153]: 00-065000000000: Worker [8799] processing SEQNUM=4200 killed
smartd[376]: Device: /dev/sda [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 107 to 108
systemd-udevd[153]: Worker [8799] terminated by signal 9 (KILL)
systemd-udevd[153]: 00-065000000000: Worker [8799] failed
systemd-udevd[153]: 00-865000000000: Worker [8820] processing SEQNUM=4201 is taking a long time
systemd-udevd[153]: 00-865000000000: Worker [8820] processing SEQNUM=4201 killed
systemd-udevd[153]: Worker [8820] terminated by signal 9 (KILL)
systemd-udevd[153]: 00-865000000000: Worker [8820] failed
systemd-udevd[153]: 00-465000000000: Worker [8836] processing SEQNUM=4202 is taking a long time
systemd-udevd[153]: 00-465000000000: Worker [8836] processing SEQNUM=4202 killed
systemd-udevd[153]: Worker [8836] terminated by signal 9 (KILL)
systemd-udevd[153]: 00-465000000000: Worker [8836] failed
systemd-udevd[153]: 00-c65000000000: Worker [8863] processing SEQNUM=4203 is taking a long time
CRON[8860]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
systemd-udevd[153]: 00-265000000000: Worker [8871] processing SEQNUM=4204 is taking a long time
systemd-udevd[153]: 00-c65000000000: Worker [8863] processing SEQNUM=4203 killed
systemd-udevd[153]: Worker [8863] terminated by signal 9 (KILL)
CRON[8868]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
systemd-udevd[153]: 00-c65000000000: Worker [8863] failed
dbus-daemon[363]: [system] Connection has not authenticated soon enough, closing it (auth_timeout=30000ms, elapsed: 57891ms)
systemd-udevd[153]: 00-a65000000000: Worker [8883] processing SEQNUM=4205 is taking a long time
dbus-daemon[363]: [system] Connection has not authenticated soon enough, closing it (auth_timeout=30000ms, elapsed: 119396ms)
systemd-udevd[153]: 00-665000000000: Worker [8883] processing SEQNUM=4206 is taking a long time
systemd-udevd[153]: 00-665000000000: Worker [8883] processing SEQNUM=4206 killed
systemd-udevd[153]: Worker [8883] terminated by signal 9 (KILL)
dbus-daemon[363]: [system] Connection has not authenticated soon enough, closing it (auth_timeout=30000ms, elapsed: 97411ms)
systemd-udevd[153]: 00-e65000000000: Worker [8871] processing SEQNUM=4207 is taking a long time
CRON[8894]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
systemd-udevd[153]: 00-165000000000: Worker [8913] processing SEQNUM=4208 is taking a long time
systemd-udevd[153]: 00-e65000000000: Worker [8871] processing SEQNUM=4207 killed
dbus-daemon[363]: [system] Connection has not authenticated soon enough, closing it (auth_timeout=30000ms, elapsed: 59445ms)
systemd-udevd[153]: Worker [8871] terminated by signal 9 (KILL)
systemd-udevd[153]: 00-e65000000000: Worker [8871] failed
systemd-udevd[153]: 00-165000000000: Worker [8913] processing SEQNUM=4208 killed
CRON[8934]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
systemd-udevd[153]: Worker [8913] terminated by signal 9 (KILL)
systemd-udevd[153]: 00-165000000000: Worker [8913] failed
systemd-udevd[153]: 00-965000000000: Worker [8932] processing SEQNUM=4209 is taking a long time
dbus-daemon[363]: [system] Connection has not authenticated soon enough, closing it (auth_timeout=30000ms, elapsed: 53374ms)
systemd-udevd[153]: 00-965000000000: Worker [8932] processing SEQNUM=4209 killed
systemd-udevd[153]: 00-565000000000: Worker [8936] processing SEQNUM=4210 is taking a long time
dbus-daemon[363]: [system] Connection has not authenticated soon enough, closing it (auth_timeout=30000ms, elapsed: 52022ms)
systemd-udevd[153]: Worker [8932] terminated by signal 9 (KILL)
systemd-udevd[153]: 00-965000000000: Worker [8932] failed
systemd-udevd[153]: 00-565000000000: Worker [8936] processing SEQNUM=4210 killed
systemd-udevd[153]: 00-365000000000: Worker [8955] processing SEQNUM=4212 is taking a long time
systemd-udevd[153]: 00-d65000000000: Worker [8940] processing SEQNUM=4211 is taking a long time
systemd-udevd[153]: Worker [8936] terminated by signal 9 (KILL)
systemd-udevd[153]: 00-365000000000: Worker [8955] processing SEQNUM=4212 killed
systemd-udevd[153]: 00-b65000000000: Worker [8967] processing SEQNUM=4213 is taking a long time
systemd-udevd[153]: Worker [8955] terminated by signal 9 (KILL)
systemd-udevd[153]: 00-365000000000: Worker [8955] failed
systemd-udevd[153]: 00-b65000000000: Worker [8967] processing SEQNUM=4213 killed
systemd-udevd[153]: Worker [8967] terminated by signal 9 (KILL)
systemd-udevd[153]: 00-b65000000000: Worker [8967] failed
systemd[1]: logstash.service: Main process exited, code=killed, status=9/KILL
systemd[1]: logstash.service: Failed with result 'signal'.
systemd-udevd[153]: 00-765000000000: Worker [8940] processing SEQNUM=4214 is taking a long time
systemd-udevd[153]: 00-765000000000: Worker [8940] processing SEQNUM=4214 killed
systemd-udevd[153]: Worker [8940] terminated by signal 9 (KILL)
systemd[1]: logstash.service: Service RestartSec=100ms expired, scheduling restart.
systemd[1]: logstash.service: Scheduled restart job, restart counter is at 6.
systemd[1]: Stopped logstash.

Hello!

Would it be possible to share:

  • the exact JVM version (1.8.?)
  • the configuration file logatash.yml
  • the pipeline file

From the graph I cannot see any increase of the JVM heap usage (quite stable around 200mb), meaning if we have a leak is probably in the native/direct memory as the process has been killed by the Linux OOM killer.

JVM:

openjdk version "1.8.0_252"
OpenJDK Runtime Environment (AdoptOpenJDK)(build 1.8.0_252-b09)
OpenJDK Zero VM (AdoptOpenJDK)(build 25.252-b09, interpreted mode)

jvm.options:

-Xms512m
-Xmx512m

-XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly

-Djava.awt.headless=true

-Dfile.encoding=UTF-8

-Djruby.compile.invokedynamic=true
-Djruby.jit.threshold=0
-Djruby.regexp.interruptible=true

-XX:+HeapDumpOnOutOfMemoryError

-Xloggc:/mnt/disk/logs/logstash/gc.log

-Djava.security.egd=file:/dev/urandom

-Dlog4j2.isThreadContextMapInheritable=true

-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.port=1098

logshash.xml:

pipeline.batch.size: 1
pipeline.batch.delay: 50

path.data: /var/lib/logstash

pipeline.workers: 2

queue.max_bytes: 200mb

log.level: debug
path.logs: /mnt/disk/logs/logstash

config.reload.automatic: false
config.reload.interval: 3s

Pipelines:

input {
  http {
    port => 8099
  }
}
output {
  elasticsearch {
    hosts => ["localhost:9200"]
    index => "mailgun-%{+YYYY.MM.dd}"
  }
}

Hello @Sonkun

Thank you!

I would like to start adding -XX:MaxDirectMemorySize=256m to jvm.options.
This will not solve the problem, but it will tell us if Logstash is trying to use too much direct memory as it will trigger an exception.

Try to increase the batch size to at least 50 events and set the pipeline workers to 1.
Now you're sending one event per bulk request which is not optimal.

After 9h memory still increase but much slower, now logstash consume 1228M res memory, and 1664 virtual memory.
So probably from 5-6h to leak we go to ~25-30h.

Would it be possible to get some stats collected periodically like jmap -histo:live <ls pid> ?

Or if you can collect an heap dump now and then in few hours so we can compare it.

I have a rpi4 and I might run a similar test...

Are the clients sending data via http reusing the connections?

Another point is to verify if Logstash manages to accept incoming connections fast enough.
Can you run lsof -i -p <ls pid> once, wait 1h and then run it again and post the output?

Use gisthub or any service you prefer to share those files.

I'm not sure if client closing connection, it's not in my control.
Also I can't execute jmap ("target proces doesn't respond within 10500 ms or HotSpot VM not loaded").
But I can create heap dump from remote VisualVM. So I'll put it there after 2h.
Lsof also works fine and I'l put it here with heap dump.

1 Like

There you have lsof files:



Heap dump's are too big (170mb) and not UTF-8 compatibile, so github gist don't want to accept them.
https://gofile.io/d/gkqKZ5

And here we have next Out of Memory:
syslog

[337809.869077] modprobe invoked oom-killer: gfp_mask=0x6200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=0
[337809.869085] modprobe cpuset=/ mems_allowed=0
[337809.869098] CPU: 0 PID: 29286 Comm: modprobe Tainted: G         C        4.19.97-v8+ #1294
[337809.869102] Hardware name: Raspberry Pi 4 Model B Rev 1.2 (DT)
[337809.869105] Call trace:
[337809.869113]  dump_backtrace+0x0/0x178
[337809.869117]  show_stack+0x24/0x30
[337809.869123]  dump_stack+0xa4/0xdc
[337809.869130]  dump_header+0x64/0x248
[337809.869134]  oom_kill_process+0x320/0x370
[337809.869138]  out_of_memory+0xf8/0x308
[337809.869141]  __alloc_pages_nodemask+0xda0/0xe38
[337809.869145]  filemap_fault+0x430/0x590
[337809.869150]  ext4_filemap_fault+0x38/0x58
[337809.869155]  __do_fault+0x58/0x120
[337809.869158]  __handle_mm_fault+0x800/0xb28
[337809.869161]  handle_mm_fault+0x130/0x238
[337809.869166]  do_page_fault+0x1c0/0x498
[337809.869171]  do_translation_fault+0xa8/0xb8
[337809.869174]  do_mem_abort+0x68/0x128
[337809.869177]  el0_da+0x20/0x24
[337809.869180] Mem-Info:
[337809.869188] active_anon:605012 inactive_anon:299728 isolated_anon:0
[337809.869188]  active_file:259 inactive_file:255 isolated_file:0
[337809.869188]  unevictable:1971 dirty:0 writeback:0 unstable:0
[337809.869188]  slab_reclaimable:20089 slab_unreclaimable:19650
[337809.869188]  mapped:3090 shmem:45970 pagetables:3406 bounce:0
[337809.869188]  free:6912 free_pcp:1337 free_cma:291
[337809.869196] Node 0 active_anon:2420048kB inactive_anon:1198912kB active_file:1036kB inactive_file:1020kB unevictable:7884kB isolated(anon):0kB isolated(file):0kB mapped:12360kB dirty:0kB writeback:0kB shmem:183880kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
[337809.869204] DMA32 free:15432kB min:3744kB low:4680kB high:5616kB active_anon:702528kB inactive_anon:102976kB active_file:356kB inactive_file:76kB unevictable:1028kB writepending:0kB present:970752kB managed:891816kB mlocked:1028kB kernel_stack:448kB pagetables:1712kB bounce:0kB free_pcp:4888kB local_pcp:1352kB free_cma:1164kB
[337809.869207] lowmem_reserve[]: 0 2936 2936
[337809.869219] Normal free:12216kB min:12636kB low:15792kB high:18948kB active_anon:1717520kB inactive_anon:1095936kB active_file:1060kB inactive_file:1324kB unevictable:6856kB writepending:0kB present:3080192kB managed:3007256kB mlocked:6856kB kernel_stack:4880kB pagetables:11912kB bounce:0kB free_pcp:460kB local_pcp:64kB free_cma:0kB
[337809.869221] lowmem_reserve[]: 0 0 0
[337809.869229] DMA32: 389*4kB (UMEC) 200*8kB (UEC) 297*16kB (UEC) 81*32kB (UEC) 39*64kB (UEC) 16*128kB (UEC) 1*256kB (C) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 15300kB
[337809.869255] Normal: 1209*4kB (UMEH) 128*8kB (UMEH) 215*16kB (UMEH) 63*32kB (UEH) 19*64kB (UEH) 3*128kB (H) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 12916kB
[337809.869280] 57261 total pagecache pages
[337809.869284] 10067 pages in swap cache
[337809.869288] Swap cache stats: add 204706, delete 194636, find 398088/406463
[337809.869291] Free swap  = 0kB
[337809.869294] Total swap = 524284kB
[337809.869297] 1012736 pages RAM
[337809.869301] 0 pages HighMem/MovableOnly
[337809.869303] 37968 pages reserved
[337809.869306] 16384 pages cma reserved
[337809.869310] Tasks state (memory values in pages):
[337809.869313] [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
[337809.869329] [    119]     0   119     9478      900   196608      134             0 systemd-journal
[337809.869335] [    140]     0   140     1135       29    36864       14             0 blkmapd
[337809.869340] [    153]     0   153     4522      296    45056      110         -1000 systemd-udevd
[337809.869348] [    329]     0   329      784      315    32768       11             0 rpc.idmapd
[337809.869353] [    342]   103   342     1642      262    40960       34             0 rpcbind
[337809.869358] [    343]     0   343     1926      419    36864      648             0 haveged
[337809.869364] [    353]     0   353     6463      311    53248       27             0 rsyslogd
[337809.869369] [    356] 65534   356     1241      270    36864        6             0 thd
[337809.869374] [    359]   105   359     1636      240    40960        6             0 avahi-daemon
[337809.869379] [    360]     0   360      921      153    32768        1             0 alsactl
[337809.869384] [    363]   104   363     1832      338    45056       51          -900 dbus-daemon
[337809.869390] [    364]     0   364     2683      260    49152       41             0 wpa_supplicant
[337809.869395] [    372]     0   372     3372      243    57344       72             0 systemd-logind
[337809.869400] [    376]     0   376     1452      121    40960       71             0 smartd
[337809.869405] [    379]     0   379      464      250    32768       12             0 atopacctd
[337809.869410] [    463]     0   463     2835      349    53248       53             0 wpa_supplicant
[337809.869415] [    515]   105   515     1604       56    40960       18             0 avahi-daemon
[337809.869430] [    570]     0   570      533       24    28672        9             0 hciattach
[337809.869435] [    573]     0   573     2382      103    45056       40             0 bluetoothd
[337809.869440] [    629]     0   629      740       97    32768        4             0 dhcpcd
[337809.869451] [    640]     0   640     7993      366    90112      268             0 nmbd
[337809.869456] [    645]   106   645     2178      402    45056       14             0 ntpd
[337809.869461] [    649]     0   649     2730      151    49152      116         -1000 sshd
[337809.869466] [    650]     0   650     7906      435    86016     6097             0 rpc.mountd
[337809.869472] [    667]     0   667     2079      202    45056       39             0 cron
[337809.869477] [    791]     0   791     1162      284    32768       14             0 agetty
[337809.869493] [   1123]   109  1123     3675      227    49152      111             0 exim4
[337809.869514] [   1165]     0  1165    11356      540   110592      183             0 cleanupd
[337809.869520] [   1171]     0  1171    12004      522   118784      175             0 lpqd
[337809.869525] [   1204]  1005  1204     3127      375    53248      219             0 rsync
[337809.869530] [   2248]     0  2248     9576      345    69632       65             0 polkitd
[337809.869536] [   3661]   118  3661   280914   144439  2236416   103679             0 java
[337809.869541] [   4365]  1005  4365   106226    59852  1597440    14796             0 node
[337809.869546] [  29604]     0 29604     4920      561    61440      145             0 proftpd
[337809.869551] [  25561]   999 25561   685198   571477  4841472      344             0 java
[337809.869557] [  26472]     0 26472     2168     1882    40960        0             0 atop
[337809.869606] [  16148]     0 16148    22851     1979   184320      127             0 smbd
[337809.869625] [  29196]     0 29196     2631      105    40960        0             0 sshd
[337809.869630] [  29198]     0 29198     2631      104    40960        0             0 sshd
[337809.869635] [  29202]  1000 29202    12803       83    61440        0             0 php
[337809.869640] [  29203]     0 29203     2631      105    36864        0             0 sshd
[337809.869645] [  29207]     0 29207     2631      121    40960        0             0 sshd
[337809.869650] [  29209]     0 29209     2631      176    40960        0             0 sshd
[337809.869655] [  29210]     0 29210     2631      197    40960        0             0 sshd
[337809.869660] [  29211]     0 29211     2398      295    45056       10             0 cron
[337809.869664] [  29213]     0 29213     2631      105    40960        0             0 sshd
[337809.869669] [  29215]     0 29215     2631      193    40960        0             0 sshd
[337809.869674] [  29216]     0 29216     2631      104    40960        0             0 sshd
[337809.869679] [  29219]     0 29219     2631      215    36864        0             0 sshd
[337809.869684] [  29224]     0 29224     2631      162    40960        0             0 sshd
[337809.869689] [  29227]     0 29227     2631      167    40960        0             0 sshd
[337809.869695] [  29228]     0 29228     2631      201    40960        0             0 sshd
[337809.869700] [  29230]     0 29230     2398      395    45056       10             0 cron
[337809.869704] [  29231]     0 29231     2631      194    40960        0             0 sshd
[337809.869710] [  29237]     0 29237     2631      103    40960        0             0 sshd
[337809.869715] [  29238]     0 29238     2398      334    45056       12             0 cron
[337809.869719] [  29239]   109 29239     3300       61    45056        0             0 exim4
[337809.869724] [  29241]     0 29241     2631      104    36864        0             0 sshd
[337809.869729] [  29245]     0 29245     2631      212    40960        0             0 sshd
[337809.869735] [  29249]     0 29249     2631      104    40960        0             0 sshd
[337809.869739] [  29250]     0 29250     2398      368    45056       12             0 cron
[337809.869744] [  29257]     0 29257     2631       81    36864        0             0 sshd
[337809.869750] [  29262]     0 29262     2631       74    40960        0             0 sshd
[337809.869755] [  29264]     0 29264     2631      102    36864        0             0 sshd
[337809.869760] [  29266]     0 29266     1126       47    28672        0             0 modprobe
[337809.869764] [  29267]     0 29267     1126       46    28672        0             0 modprobe
[337809.869769] [  29268]     0 29268     1126       47    28672        0             0 modprobe
[337809.869775] [  29272]     0 29272     2631      108    40960        0             0 sshd
[337809.869779] [  29274]     0 29274     1126       46    28672        0             0 modprobe
[337809.869785] [  29280]     0 29280     1126      118    28672        0             0 modprobe
[337809.869790] [  29281]     0 29281     2398      304    45056       13             0 cron
[337809.869795] [  29284]     0 29284     2629       69    36864        0             0 sshd
[337809.869800] [  29285]     0 29285     2086       47    36864        0             0 sshd
[337809.869805] [  29286]     0 29286     1126       22    28672        0             0 modprobe
[337809.869810] Out of memory: Kill process 25561 (java) score 518 or sacrifice child
[337809.869873] Killed process 25561 (java) total-vm:2740792kB, anon-rss:2285908kB, file-rss:0kB, shmem-rss:0kB
[337810.385501] oom_reaper: reaped process 25561 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
smartd[376]: Device: /dev/sda [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 106 to 107
systemd-udevd[153]: 00-3a5800000000: Worker [29205] processing SEQNUM=8292 killed
systemd-udevd[153]: Worker [29205] terminated by signal 9 (KILL)
CRON[29265]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
systemd-udevd[153]: 00-3a5800000000: Worker [29205] failed
systemd-udevd[153]: 00-ba5800000000: Worker [29218] processing SEQNUM=8293 is taking a long time
dbus-daemon[363]: [system] Connection has not authenticated soon enough, closing it (auth_timeout=30000ms, elapsed: 43275ms)
systemd-udevd[153]: 00-ba5800000000: Worker [29218] processing SEQNUM=8293 killed
systemd-udevd[153]: Worker [29218] terminated by signal 9 (KILL)
systemd-udevd[153]: 00-ba5800000000: Worker [29218] failed
systemd-udevd[153]: 00-7a5800000000: Worker [29244] processing SEQNUM=8294 is taking a long time
systemd-udevd[153]: 00-7a5800000000: Worker [29244] processing SEQNUM=8294 killed
systemd-udevd[153]: Worker [29244] terminated by signal 9 (KILL)
systemd-udevd[153]: 00-7a5800000000: Worker [29244] failed
systemd-udevd[153]: 00-fa5800000000: Worker [29258] processing SEQNUM=8295 is taking a long time
systemd-udevd[153]: 00-fa5800000000: Worker [29258] processing SEQNUM=8295 killed
systemd-udevd[153]: Worker [29258] terminated by signal 9 (KILL)
systemd-udevd[153]: 00-fa5800000000: Worker [29258] failed
systemd-udevd[153]: 00-065800000000: Worker [29277] processing SEQNUM=8296 is taking a long time
systemd-udevd[153]: 00-065800000000: Worker [29277] processing SEQNUM=8296 killed
systemd-udevd[153]: Worker [29277] terminated by signal 9 (KILL)
systemd-udevd[153]: 00-065800000000: Worker [29277] failed
systemd[1]: logstash.service: Main process exited, code=killed, status=9/KILL
systemd[1]: logstash.service: Failed with result 'signal'.
systemd[1]: logstash.service: Service RestartSec=100ms expired, scheduling restart.
systemd[1]: logstash.service: Scheduled restart job, restart counter is at 1.
systemd[1]: Stopped logstash.
systemd[1]: Started logstash.

Hello @Sonkun

First of all, Logstash is being killed by the operating system.
The oom killer is choosing the process using the biggest amount of memory.
The oom killer was triggered because a modprobe required some free memory and there was nothing left to use.

Secondly, you're running Elasticsearch, Kibana and Logstash (and many other services) on the same host.
This requires to restrain the use of resources by all the 3 processes.

I would start adding the flag -XX:MaxDirectMemorySize= you used for Logstash to Elasticsearch, to a value half of the max jvm heap of Elasticsearch.
What is the jvm max heap size for Elasticsearch?

Consider Kibana (nodejs) sets a maximum heap to 1.4gb.

That all true, but my elastic, kibana and other tools have stable usage on RAM, only logstash consumes more and more. I'm aware of killing logstash by OS, but problem is not how logstash is killed, but why. How logstash starting with ~200MB of RAM is killed with more than 2GB when it's does nothing... and it's set to use max 512MB per pipeline?

Hello @Sonkun

I'm trying to reproduce your problem with the same pipeline, same JVM (OpenJDK 1.8.252), but on a Linux x64 system with 64bit JVM.

I've ran a while loop to generate some input.

vm@vm-VirtualBox:~/Downloads/logstash-7.6.2$ ps v 152465
    PID TTY      STAT   TIME  MAJFL   TRS   DRS   RSS %MEM COMMAND
 152465 pts/0    Sl+    4:15      0     2 4095845 672016  8.2 /home/vm/Downloads/jdk-8u252-ojdkbuild-linux-x

vm@vm-VirtualBox:~/Downloads/logstash-7.6.2$ ps v 152465
    PID TTY      STAT   TIME  MAJFL   TRS   DRS   RSS %MEM COMMAND
 152465 pts/0    Sl+    4:56      0     2 4095845 673080  8.2 /home/vm/Downloads/jdk-8u252-ojdkbuild-linux-x

vm@vm-VirtualBox:~/Downloads/logstash-7.6.2$ ps v 152465
    PID TTY      STAT   TIME  MAJFL   TRS   DRS   RSS %MEM COMMAND
 152465 pts/0    Sl+    5:11      0     2 4095845 673472  8.2 /home/vm/Downloads/jdk-8u252-ojdkbuild-linux-x6

vm@vm-VirtualBox:~/Downloads/logstash-7.6.2$ ps v 152465
    PID TTY      STAT   TIME  MAJFL   TRS   DRS   RSS %MEM COMMAND
 152465 pts/0    Sl+    5:42      0     2 4096413 740576  9.0 /home/vm/Downloads/jdk-8u252-ojdkbuild-linux-x6

vm@vm-VirtualBox:~/Downloads/logstash-7.6.2$ ps v 152465
    PID TTY      STAT   TIME  MAJFL   TRS   DRS   RSS %MEM COMMAND
 152465 pts/0    Sl+    5:47      0     2 4096413 740672  9.0 /home/vm/Downloads/jdk-8u252-ojdkbuild-linux-x6

With no activity on HTTP input:

vm@vm-VirtualBox:~/Downloads/logstash-7.6.2$ ps v 264000
    PID TTY      STAT   TIME  MAJFL   TRS   DRS   RSS %MEM COMMAND
 264000 pts/0    Sl+    1:51      0     2 4087661 580836  7.1 /home/vm/Downloads/jdk-8u252-ojdkbuild-linux-x6

vm@vm-VirtualBox:~/Downloads/logstash-7.6.2$ ps v 264000
    PID TTY      STAT   TIME  MAJFL   TRS   DRS   RSS %MEM COMMAND
 264000 pts/0    Sl+    1:54      0     2 4089693 601152  7.3 /home/vm/Downloads/jdk-8u252-ojdkbuild-linux-x6

vm@vm-VirtualBox:~/Downloads/logstash-7.6.2$ ps v 264000
    PID TTY      STAT   TIME  MAJFL   TRS   DRS   RSS %MEM COMMAND
 264000 pts/0    Sl+    1:55      0     2 4089693 602116  7.3 /home/vm/Downloads/jdk-8u252-ojdkbuild-linux-x6

There is some light overhead being added over time on the RSS memory.
This is not an leak which occurs in the JVM Heap but something on the Native memory (which might be a direct buffer).

I'll try to dig more into using jemalloc to understand if there's a leak (ref).

On the HTTP input, we use netty under the hood.
I've tried to run it with:

-XX:NativeMemoryTracking=detail
-XX:MaxDirectMemorySize=4m
-Dio.netty.leakDetectionLevel=PARANOID
-Dio.netty.leakDetection.maxRecords=32

But I got no error, nor logs.

I'll try to dig more into comparing with JDK 11.

Thanks Luca.
I tried with few other options (changing JVM [openjdk vs adoptedopenjdk] - logstash works only on JVM 8 on rpi, higher have problem with setup-install [--module-path from ruby scripts is not working on 8]), and problem is the same.
I've upgrade all ELK stack to 7.7.0, and now I'm waiting for results.

Please be aware we do not officially support Logstash / Elasticsearch / Kibana on ARM architectures.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.