50 pipelines and two instances

pastechecker · September 19, 2018, 7:43am

Hello.
I have two nodes on which the Logstash and Elastic runs.
At the moment I am using only one node in production for Logstash, 50 pipelines, processing approximately 600,000 records per 1h (running scheduler with http_poller). CPU is 16 threads, AMD x1900 thread ripper on every node (https://www.newegg.com/Product/Product.aspx?Item=N82E16819113457).

Hence two questions about recommendations:

1).
If I use the same Logstash configuration on both nodes that will load data to the same cluster, will both Logstashes load the same data twice? Because clustering is not possible, the pipe load balancing using Logstash version 6.4.0 is only possible by splitting the config and running 50% of it on 1st node, another 50% on 2nd node.

2).
pipeline.batch.size: 125 ( Values in excess of the optimum range cause performance degradation due to frequent garbage collection or JVM crashes related to out-of-memory exceptions. )
pipeline.batch.delay: 5

My jvm.options:
-Xms8g
-Xmx8g

According to Kibana, I was never above 3 GB of RAM.
Does it make sense to increase the batch.size?

magnusbaeck · September 20, 2018, 5:27pm

At the moment I am using only one node in production for Logstash, 50 pipelines, processing approximately 600,000 records per 1h (running scheduler with http_poller).

That's about 167 events/s. Unless you're doing extensive filtering a single core should handle that load quite comfortably.

If I use the same Logstash configuration on both nodes that will load data to the same cluster, will both Logstashes load the same data twice?

Yes.

Because clustering is not possible, the pipe load balancing using Logstash version 6.4.0 is only possible by splitting the config and running 50% of it on 1st node, another 50% on 2nd node.

What problem are you trying to solve? Single point of failure or load distribution (i.e. performance)?

pastechecker · September 27, 2018, 8:59am

Hello Magnus.
Appreciate the reply and apologies for not being precise from the beginning.

I am trying to solve the problem with single point of failure.
I should rephrase the issue: Its approximately 600,000-1,000,000 events processed within 5 minute period (scheduler trigger cron expression: 5 * * * *), so we get 2000 events / s - 3300 events / s. CPU during that time is saturated to 100%.

magnusbaeck · September 28, 2018, 6:11am

Avoiding the single point of failure for the fetching from the database is tricky. I don't have a simple solution to that. However, if you figure something out or simply accept the situation you could send those messages into a message broker and then you can have multiple Logstash instances reading from a queue in that broker and passing things on to ES. That both improves performance (assuming your ES setup scales along) and the fault tolerance.

system · October 26, 2018, 6:11am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Two Logstash instances in the same machine are faster than only one Logstash	1	207	September 17, 2020
Make logstash run on two nodes Logstash	6	499	March 1, 2019
Logstash tuning w/ 35 pipelines Logstash	1	179	October 26, 2021
Logstash performance (choice) Logstash	4	292	July 1, 2019
How many logstash can connect to elasticsearch cluster? Elasticsearch	2	689	December 24, 2021

50 pipelines and two instances

Related topics