Logstash is horizontally scalable and can form groups of nodes running the same pipeline
We are aware that you are able to load balance with HAProxy, beats and/or hardware loadbalancing. But that line does not indicate how to get Logstash balancing working - it's very vague.
There is very little supporting documentation regarding this. How is this achieved? From what I've seen, there is no support for HA / clustering in Logstash at this moment in time.
Does anyone perhaps have any input on the above? Any feedback would be greatly appreciated.
For my understanding there are a coupe of Options to scale Logstash horizontally. One is to use a couple of Logstash instances which feeds data into kafka streams. After this you provides some actually worker Logstash intances which form consumer groups and read the data from the streams and process them.
Even tho this is an option. I would recommend look at elasticsearch pipelines and try to implement it this way. There is a lot less mangement overhead using elasticearch pipelines and HA is automatically included because ES does.
logstash does not support clustering in the sense that members of the cluster coördinate with oneanother.
If you have more than one logstash instance running the same pipeline, then it does not matter which instance processes any particular event (unless you are using filters like aggregate that require all events to go through a single worker thread -- obviously events cannot go through the same worker thread if they are in different processes). In most cases you can scale capacity by adding more instances. You can gives beats a list of instances tell it to load balance across them. On the logstash side there really isn't much to document.
I think this means, is that you can deploy multipiple logstash instances with the same config and form this way kind of a cluster.
After this you can insert mutlipile destinations (multipile logstash instances) into your beats config and beats will automatically distribute the collected logs accross the given destionations. Beats will also keep track if one of the destinations is unavailable and send the logs to the available one.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.