I evaluating logstash 6.6 and upgrading from 2.4 version.
With older logstash 2.4 version we used kafka message queue to ingest data and then push to logstash to handle high streams.
With logstash 6.6 and pipeline enhancement features, can we avoid the message queue and ingest data from application hosts to logstash directly?
Currently we see the producer msg/sec being at 40-45 messages.
Do you think the queue is still essential or which things should we update logstash.yml in order to handle such 45 msg/sec from 15 servers?
It really depends on what kinds of processing and enrichment you are planning to do in the pipeline -- some are capable of running at tens of thousands of events per second on a single host, while others that do enrichment using high-latency external services are bound at just tens of events per second. The newest versions also include a new Java-based execution engine that you can opt into, which can significantly improve performance of some pipelines.
In version 6 of the Elastic Stack, it is common to install Filebeat on edge nodes to ship your logs to one or more centralised Logstashes. On the Logstash node you can use the default in-memory queue, which will handle spikes in load by applying back-pressure to the beats, and catching up as it has capacity (allowing the events to spool to disk on the edge nodes as they wait to send). Or, you can use the optional persistent queue, which handles spikes in load by spooling to disk on the Logstash host (which can be helpful if your edge nodes are ephemeral and liable to disappear). Using Kafka or another persistent message broker is still an option, but in many cases it is no longer necessary.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.