Hello.
I have two nodes on which the Logstash and Elastic runs.
At the moment I am using only one node in production for Logstash, 50 pipelines, processing approximately 600,000 records per 1h (running scheduler with http_poller). CPU is 16 threads, AMD x1900 thread ripper on every node (https://www.newegg.com/Product/Product.aspx?Item=N82E16819113457).
Hence two questions about recommendations:
1).
If I use the same Logstash configuration on both nodes that will load data to the same cluster, will both Logstashes load the same data twice? Because clustering is not possible, the pipe load balancing using Logstash version 6.4.0 is only possible by splitting the config and running 50% of it on 1st node, another 50% on 2nd node.
2).
pipeline.batch.size: 125 ( Values in excess of the optimum range cause performance degradation due to frequent garbage collection or JVM crashes related to out-of-memory exceptions. )
pipeline.batch.delay: 5
My jvm.options:
-Xms8g
-Xmx8g
According to Kibana, I was never above 3 GB of RAM.
Does it make sense to increase the batch.size?
At the moment I am using only one node in production for Logstash, 50 pipelines, processing approximately 600,000 records per 1h (running scheduler with http_poller).
That's about 167 events/s. Unless you're doing extensive filtering a single core should handle that load quite comfortably.
If I use the same Logstash configuration on both nodes that will load data to the same cluster, will both Logstashes load the same data twice?
Yes.
Because clustering is not possible, the pipe load balancing using Logstash version 6.4.0 is only possible by splitting the config and running 50% of it on 1st node, another 50% on 2nd node.
What problem are you trying to solve? Single point of failure or load distribution (i.e. performance)?
Hello Magnus.
Appreciate the reply and apologies for not being precise from the beginning.
I am trying to solve the problem with single point of failure.
I should rephrase the issue: Its approximately 600,000-1,000,000 events processed within 5 minute period (scheduler trigger cron expression: 5 * * * *), so we get 2000 events / s - 3300 events / s. CPU during that time is saturated to 100%.
Avoiding the single point of failure for the fetching from the database is tricky. I don't have a simple solution to that. However, if you figure something out or simply accept the situation you could send those messages into a message broker and then you can have multiple Logstash instances reading from a queue in that broker and passing things on to ES. That both improves performance (assuming your ES setup scales along) and the fault tolerance.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.