Optimizing Logstash Performance: Troubleshooting Instability at 10,000 EPS During Stress Tests

wangsubo · October 1, 2024, 3:15am

I am currently using an architecture with Elastic-Agent for log collection and Logstash for log forwarding. I am conducting stress testing to evaluate the hardware requirements and costs for my collector setup (Elastic-Agent + Logstash) using Apache JMeter to simulate EPS rates of 2000, 5000, and 10000.

The setup is as follows:
Apache JMeter (192.168.3.170) -> Elastic-Agent-[Custom UDP Logs] (192.168.3.172:515) -> Logstash (192.168.3.172:5044) -> Elasticsearch (192.168.3.173:9200).

My current collector setup includes:

4 Core
JVM Heap: 4GB
Batch Size: 125

When simulating 2000 and 5000 EPS, the Logstash Monitoring curves are relatively stable, and the event drop rate does not fluctuate much. However, at 10000 EPS, the Logstash Monitoring curve becomes unstable, and there is significant event drop variation.

(1)2000EPS

(2)5000EPS

(3)10000EPS

I am aware that the event reception rate of Logstash could be affected by factors such as Elasticsearch’s disk performance, shard configuration, and network.

I am looking for a more straightforward and effective way to quickly identify the cause of instability at 10000 EPS and evaluate the hardware requirements and costs for the Elastic-Agent + Logstash collector setup more accurately.

logstash-udp.conf

input {
elastic_agent {
port => 5044
ssl_enabled => true
ssl_certificate_authorities => ["/etc/logstash/certs/elasticsearch-ca.pem"]
ssl_certificate => "/etc/logstash/certs/logstash.crt"
ssl_key => "/etc/logstash/certs/logstash.pkcs8.key"
ssl_client_authentication => "required"
}
}

output {
elasticsearch {
hosts => ["https://192.168.3.171:9200"]
data_stream => "true"
user => "elastic"
password => "password"
cacert => "/etc/logstash/certs/elasticsearch-ca.pem"
}
}

Badger · October 1, 2024, 3:26am

Can you clarify what in the many images that you posted shows that? I am not seeing any instability.

wangsubo · October 1, 2024, 3:37am

I used Apache JMeter to send 2000, 5000, and 10000 events per second and observed the event volume in Kibana and the Logstash monitor curves.

The difference between the number of events sent and the actual number received was not significant:

EPS 2000 sustained for 30 minutes = 3,600,000
events Actual received volume: 3,596,929 hits
EPS 5000 sustained for 30 minutes = 9,000,000
events Actual received volume: 8,974,598 hits
EPS 10,000 sustained for 30 minutes = 18,000,000
events Actual received volume: 16,312,575 hits

Additionally, the curve at EPS 10,000 shows significant instability.

Badger · October 1, 2024, 3:50am

I don't really agree with that, but I think you are seeing delays caused by GC cycles.

leandrojmp · October 1, 2024, 4:01am

Besides what Badger said, you also need to consider that with more events, the elasticsearch performance will have more influence.

You will need to start changing the batch size, 125 is pretty low when you start getting more events, as this would result in more requests to Elasticsearch with smaller batches.

Topic		Replies	Views
Stability Issues at 10k EPS in Elastic-Agent + Logstash – Elasticsearch Bottleneck? Logstash	0	42	October 17, 2024
Elastic search and logstash performance tuning Logstash	2	435	March 25, 2019
Elasticsearch and logstash performance drop Elasticsearch	1	435	April 2, 2019
Logstash can't send to elastic higher 3500 rps on one node Logstash	1	357	September 21, 2020
Logstash is processing logs too slow Logstash	4	3331	July 6, 2017

Optimizing Logstash Performance: Troubleshooting Instability at 10,000 EPS During Stress Tests

Related topics