Event Emission Behavior

wwalker · February 14, 2018, 9:53pm

TLDR: It appears Logstash's emission rate is limited in some way other than provided resources.

Started having a weird issue yesterday with Logstash performing very poorly. Typical ingest over the past couple weeks has been around 4,500 events/second. Logstash has been keeping up fine with that but suddenly, the disk queue started to fill yesterday even though there wasn't any kind of unusual increase in events. Once the queue hit about 25GB (Max capacity 100GB), I started shutting down agents to reduce ingest rate. I noticed that as the ingest rate went down, so too did the emission rate. It wasn't at a 1:1 ratio but it was still lowering near the same amount as ingest. After the queue emptied out I turned all the agents back on and the queue began to fill again but eventually tapered off around 12GB by the time I left for the day. I came in this morning to find that the queue never fully cleared out, even though our overnight ingest rate drops to about 1,000/sec. I've seen this thing hit 10k/sec at times so why is it failing to keep up?

It seems like if it can handle ingesting 4,500/sec and emitting 4,500/sec then it's capable of processing 9,000/sec. That appears to be wrong but why or is there something wrong with my setup?

Christian_Dahlqvist · February 14, 2018, 9:59pm

Logstash can only process data as fast as downstream systems are able to accept it. The slowest destination will typically determine the speed. Where are you sending your data? Any issues there?

wwalker · February 14, 2018, 10:21pm

This ElasticStack is in POC phase at the moment so it's sending data to itself with all three applications loaded onto the same machine.

Christian_Dahlqvist · February 15, 2018, 6:09am

What is the specification of the machine? What does CPU usage and disk I/O and iowait look like? Do you have monitoring installed so you can see GC and merging activity over time?

wwalker · February 15, 2018, 7:43am

This is running on a Server 2012 R2 VM with 6 2.4Ghz cpu. Storage didn't looked taxed, about 10-20% active time and about 30-40MB/s writes, though it's capable of a whole lot more. I have monitoring setup, whatever is available with basic x-pack licensing, I noticed an uptick in old GC collections going on, usually there are none. However, I'm not sure what the implications are of having old GC occurring.

After shutting off all beats, and letting the buffer fully dump clean. I restarted everything at once to see what burst activity would do to it, 33 instances of MetricBeat, Winglogbeat, and a couple light audit/filebeat agents. I saw concurrent ingest/emit rates of over 10k/sec at one point but the emission rate kept up with the ingest rate. This is also with four Exchange servers dumping ALL possible logs in winglogbeat. Some of those are incredibly noisy as well.

system · March 15, 2018, 7:44am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Ingestion performance issues - where to start? Elasticsearch	6	655	September 18, 2020
Slow ingestion rate Elasticsearch	1	736	July 31, 2018
Data loss happening somewhere Logstash	2	731	July 6, 2017
Recommendation for Elastic Search sizing for 45,000 Events per second Elasticsearch	6	936	June 3, 2019
Huge concurrent data ingestion to ElasticSearch Elasticsearch	16	2829	September 18, 2018

Event Emission Behavior

Related topics