I am currently running into some issues regarding the usage of logstash. I am using the following architecture:
kafka -> logstash -> elasticsearch
We have 3 kafka brokers running on separated machines, 2 logstashes and 5 Elasticsearch nodes also running on separated machines.
Kafka - We have 17 topics with 2 partitions. I see the logs coming with almost 0 delay and the machine where the broker is running seens ti be working fine.
Logstash - We have 18 pipelines in each logstash instance , where one of them is responsible for consuming from kafka and then distribute for the rest of the pipelines based on
[@metadata][kafka][topic]. The issue started when we started to use packetbeat on a large group of machines which sends events at a very high frequency (1.3k eps) for this particular pipeline. With this the logs coming from this pipeline are coming delayed... I have another pipeline with the same behavior as this one, the rest of the pipelines seem no to be affected by this perfomance issue since they have no delay. We have our servers located in Azure and we upgraded the size to B8 (previously we were using B4s) in order to get a more capable machine, but without success. B8 has 8 CPUs and 32 GB RAM. I also have tried tunning logstash with
consumer_threads which has the value 2, since we have 2 partitions on Kafka topics. Currently the load average on both of this machines is insanely high, both of them having like ~22.
When we upgraded the machines both logstashes seemed fine, working at load averages between 2-5 , but after 1/2 hours the load average increased and we started getting massive delay on the pipelines refered.
Elasticsearch - We have 3 hot nodes and 2 warm and cold nodes.
I have tried everything and still came up with no solution.
Do you have any guesses? Should i scale more my logstash instances (now probably horizontaly) and having 3 partition instead of 2?
Regards and thank you