Ingest of logs slows down over time

First of all, a big "thanks" to all of the Logstash and ELK Stack developers. Your product is certainly improving my job and a mind-blowing improvement over the tool that we are transitioning away from.

We get the logs that we parse (mostly Apache web server logs) on a monthly basis, so at the end of the month I can dump a month's worth of logs into the directories where the file inputs can snap them up. I have noticed that Logstash starts chewing through these logs at light speed, but after a couple days there is a noticeable slow-down in the rate at which these logs are parsed. Eventually, gaps will start to appear between when logs are parsed, and these gaps only lengthen with time. The worst that I have seen is almost 48 hours of no logs being indexed into Elasticsearch, after which a new log is suddenly processed. Sometimes, when the "flood-gates" re-open, it will be a huge number of Logstash events being processed, leaving me scratching my head wondering why Logstash didn't see them.

I am sure that there is something wrong with my Logstash config and would greatly appreciate any feedback to help alleviate this issue.

Config: https://pastebin.com/1aN68BFm

Specs:

RAM: 128GB DDR3 1600MHz
CPU: 2 x 6-core Intel Xeon E5-2667 2.90GHz (hyper-threading enabled)
OS: CentOS 6.9
Logstash: v. 5.3.2
Java: 1.8.0_131

Many thanks for the assistance!

What is your -Xmx and -Xms size for the Logstash execution? Sounds like you are running into Heap issues, without any additional information. If you start thrashing the heap and garbage collection cannot keep up, you can end up "stuck" in a GC loop leading to very low ingest rates.

I would pull info from the Logstash monitoring API to get more insight to what is going on.
https://www.elastic.co/guide/en/logstash/current/monitoring.html

Even better, stand up prometheus and a logstash_exporter, then graph the data using Grafana. This will give you visual information on performance.

I'm pretty sure that I'm just using the default JVM settings.

Xmx: 1g
Xms: 256m

I also noticed that the stack size per thread (Xss) was set to only 2048k, but I don't know if that can cause these problems.

I'll toss a bunch more input files into the system and see how it behaves using these profiling tools.

Thanks!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.