TLDR: It appears Logstash's emission rate is limited in some way other than provided resources.
Started having a weird issue yesterday with Logstash performing very poorly. Typical ingest over the past couple weeks has been around 4,500 events/second. Logstash has been keeping up fine with that but suddenly, the disk queue started to fill yesterday even though there wasn't any kind of unusual increase in events. Once the queue hit about 25GB (Max capacity 100GB), I started shutting down agents to reduce ingest rate. I noticed that as the ingest rate went down, so too did the emission rate. It wasn't at a 1:1 ratio but it was still lowering near the same amount as ingest. After the queue emptied out I turned all the agents back on and the queue began to fill again but eventually tapered off around 12GB by the time I left for the day. I came in this morning to find that the queue never fully cleared out, even though our overnight ingest rate drops to about 1,000/sec. I've seen this thing hit 10k/sec at times so why is it failing to keep up?
It seems like if it can handle ingesting 4,500/sec and emitting 4,500/sec then it's capable of processing 9,000/sec. That appears to be wrong but why or is there something wrong with my setup?
Logstash can only process data as fast as downstream systems are able to accept it. The slowest destination will typically determine the speed. Where are you sending your data? Any issues there?
What is the specification of the machine? What does CPU usage and disk I/O and iowait look like? Do you have monitoring installed so you can see GC and merging activity over time?
This is running on a Server 2012 R2 VM with 6 2.4Ghz cpu. Storage didn't looked taxed, about 10-20% active time and about 30-40MB/s writes, though it's capable of a whole lot more. I have monitoring setup, whatever is available with basic x-pack licensing, I noticed an uptick in old GC collections going on, usually there are none. However, I'm not sure what the implications are of having old GC occurring.
After shutting off all beats, and letting the buffer fully dump clean. I restarted everything at once to see what burst activity would do to it, 33 instances of MetricBeat, Winglogbeat, and a couple light audit/filebeat agents. I saw concurrent ingest/emit rates of over 10k/sec at one point but the emission rate kept up with the ingest rate. This is also with four Exchange servers dumping ALL possible logs in winglogbeat. Some of those are incredibly noisy as well.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.