Hi everybody,
We're having some issue with one of our logstash clusters. There are 6 machines with 32 Gb of RAM + 16 Core AWS instances running logstash docker container. Those are only logstsh servers. We've had an stable machine use of ram for the past months, it stayed on around 17-18 Gb of RAM usage. You can see in the image below the behaviour how it changed lately:
We haven't changed the version, or did any change in the configuration.
What it's more strange is we have a parallel similar cluster with exact same machine and servers version, a bit lower load processing similar logs, but that one has remain stable.
The cluster hasn't had any major change in load neither. This has happened every 24-28 hours were we have to restart the logstash containers.
We have a beats pipeline.
Versions:
Running: logstash:7.17.23
~$ /usr/share/logstash/jdk/bin/java --version
openjdk 11.0.22 2024-01-16
OpenJDK Runtime Environment Temurin-11.0.22+7 (build 11.0.22+7)
OpenJDK 64-Bit Server VM Temurin-11.0.22+7 (build 11.0.22+7, mixed mode)
Container is running in ubuntu focal with docker version: 5:26.1.3-1~ubuntu.20.04~focal
the logstash config:
pipeline.batch.size: 125
pipeline.batch.delay: 50
queue.type: persisted
queue.max_bytes: 450gb
java config:
-Xms16g
-Xmx16g
8-13:-XX:+UseConcMarkSweepGC
8-13:-XX:CMSInitiatingOccupancyFraction=75
8-13:-XX:+UseCMSInitiatingOccupancyOnly
-XX:+HeapDumpOnOutOfMemoryError
-Djava.security.egd=file:/dev/urandom
-Dlog4j2.isThreadContextMapInheritable=true
We're wondering if you could help us debug this behavior.