Logstash running on machine has high load average

Hello guys,
I am currently running into some issues regarding the usage of logstash. I am using the following architecture:

kafka -> logstash -> elasticsearch

We have 3 kafka brokers running on separated machines, 2 logstashes and 5 Elasticsearch nodes also running on separated machines.

Kafka - We have 17 topics with 2 partitions. I see the logs coming with almost 0 delay and the machine where the broker is running seens ti be working fine.

Logstash - We have 18 pipelines in each logstash instance , where one of them is responsible for consuming from kafka and then distribute for the rest of the pipelines based on [@metadata][kafka][topic]. The issue started when we started to use packetbeat on a large group of machines which sends events at a very high frequency (1.3k eps) for this particular pipeline. With this the logs coming from this pipeline are coming delayed... I have another pipeline with the same behavior as this one, the rest of the pipelines seem no to be affected by this perfomance issue since they have no delay. We have our servers located in Azure and we upgraded the size to B8 (previously we were using B4s) in order to get a more capable machine, but without success. B8 has 8 CPUs and 32 GB RAM. I also have tried tunning logstash with pipeline.batch.size, pipeline.workers and consumer_threads which has the value 2, since we have 2 partitions on Kafka topics. Currently the load average on both of this machines is insanely high, both of them having like ~22.
When we upgraded the machines both logstashes seemed fine, working at load averages between 2-5 , but after 1/2 hours the load average increased and we started getting massive delay on the pipelines refered.

Elasticsearch - We have 3 hot nodes and 2 warm and cold nodes.

I have tried everything and still came up with no solution.
Do you have any guesses? Should i scale more my logstash instances (now probably horizontaly) and having 3 partition instead of 2?

Regards and thank you

What version of things are you running?
Are you using the Stack Monitoring functionality to see what is happening?
How large are your pipelines?
What do the Logstash logs show?

I am using 7.16.2 across the stack.
For monitor i am using stack monitoring and to be sure i always go to the machines and use for example top.
I don't think my pipelines are huge but i have one with a lot of filtering (but it is working fine).
The logs don't show me anything that helps me understand the behavior, but everytime i restart my logstash i notice a very slow start compared from what i previously experienced.

Just to update, now i have a dedicated machine for one particular pipeline that has a massive rate and upgraded the machines from B8s to F8 , and now everything is working fine.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.