Question regarding Logstash Horizontal Scaling

Hi All,

Hope I can get some clarity on the documentation supplied by Logstash.

We're looking at implementing high availability for Logstash (Elasticsearch clustering is working fine).

As per the Logstash documentation at, https://www.elastic.co/guide/en/logstash/current/deploying-and-scaling.html,

They state:

Logstash is horizontally scalable and can form groups of nodes running the same pipeline

We are aware that you are able to load balance with HAProxy, beats and/or hardware loadbalancing. But that line does not indicate how to get Logstash balancing working - it's very vague.

There is very little supporting documentation regarding this. How is this achieved? From what I've seen, there is no support for HA / clustering in Logstash at this moment in time.

Does anyone perhaps have any input on the above? Any feedback would be greatly appreciated.

1 Like

For my understanding there are a coupe of Options to scale Logstash horizontally. One is to use a couple of Logstash instances which feeds data into kafka streams. After this you provides some actually worker Logstash intances which form consumer groups and read the data from the streams and process them.

Even tho this is an option. I would recommend look at elasticsearch pipelines and try to implement it this way. There is a lot less mangement overhead using elasticearch pipelines and HA is automatically included because ES does.

1 Like

Hi nimda,

Appreciate the reply.

I'm looking for an actual explanation on what Logstash meant by that sentence I supplied above.

It seems as if the Elastic Team is saying horizontal scaling is supported (like ES) in the form of a cluster but it doesn't seem so.

If anyone could elaborate on what Elastic actually means by the statement above and how to implement that, it would be greatly appreciated.

Thanks again!

logstash does not support clustering in the sense that members of the cluster coördinate with oneanother.

If you have more than one logstash instance running the same pipeline, then it does not matter which instance processes any particular event (unless you are using filters like aggregate that require all events to go through a single worker thread -- obviously events cannot go through the same worker thread if they are in different processes). In most cases you can scale capacity by adding more instances. You can gives beats a list of instances tell it to load balance across them. On the logstash side there really isn't much to document.

1 Like

I think this means, is that you can deploy multipiple logstash instances with the same config and form this way kind of a cluster.

After this you can insert mutlipile destinations (multipile logstash instances) into your beats config and beats will automatically distribute the collected logs accross the given destionations. Beats will also keep track if one of the destinations is unavailable and send the logs to the available one.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.