Logstash ramping up RAM usage change in behaviour

Hi everybody,

We're having some issue with one of our logstash clusters. There are 6 machines with 32 Gb of RAM + 16 Core AWS instances running logstash docker container. Those are only logstsh servers. We've had an stable machine use of ram for the past months, it stayed on around 17-18 Gb of RAM usage. You can see in the image below the behaviour how it changed lately:

We haven't changed the version, or did any change in the configuration.
What it's more strange is we have a parallel similar cluster with exact same machine and servers version, a bit lower load processing similar logs, but that one has remain stable.

The cluster hasn't had any major change in load neither. This has happened every 24-28 hours were we have to restart the logstash containers.

We have a beats pipeline.

Versions:
Running: logstash:7.17.23

~$ /usr/share/logstash/jdk/bin/java --version
openjdk 11.0.22 2024-01-16
OpenJDK Runtime Environment Temurin-11.0.22+7 (build 11.0.22+7)
OpenJDK 64-Bit Server VM Temurin-11.0.22+7 (build 11.0.22+7, mixed mode)

Container is running in ubuntu focal with docker version: 5:26.1.3-1~ubuntu.20.04~focal

the logstash config:

pipeline.batch.size: 125
pipeline.batch.delay: 50
queue.type: persisted
queue.max_bytes: 450gb

java config:

-Xms16g
-Xmx16g
8-13:-XX:+UseConcMarkSweepGC
8-13:-XX:CMSInitiatingOccupancyFraction=75
8-13:-XX:+UseCMSInitiatingOccupancyOnly
-XX:+HeapDumpOnOutOfMemoryError
-Djava.security.egd=file:/dev/urandom
-Dlog4j2.isThreadContextMapInheritable=true

We're wondering if you could help us debug this behavior.

Could somebody point me in the direction of how to test this issue?

Can you share your pipelines?

It is basically impossible to know what Logstash may be doing without the pipeline configurations.

Did something change in the imputs?

we're quite a big team, and there's quite a few changes, I cannot share the pipelines with you. I was wondering if it's a manual or any guide on how to debug the use of memory by logstash, maybe using some profiling could be good?

Not that I know, but Logstash is a java application, you could get a dump of the heap and try to analyze it, but I do not have experience with this.

Did anything change in the pipelines? For the behaviour to change like that I would expect a change on the load, not exactly need to be an increase on the event rate, but some changes on the messages could lead to change in performance depending on the filters used.

But without the pipelines it is not possible to have a hint of what could lead to this increase in memory usage.