Best load balancing solution for logstash service

Hi Team,

Currently we have logstash deployed on AWS ECS with service discovery (DNS), which creates DNS records pointing to task containers, We are pointing filebeat to these Domain names, with this setup Due to DNS TTL consumers (filebeat) is pointing to the same containers until TTL expires and resulting other containers being idle, this solution is not effectively use logstash service. Can we use application load balancer to serve logstash requests? or another approach to load balance logstash traffic.


tomcat (filebeat monitors log file changes) -> logstash -> elasticsearch

Thanks & Regards,
Saikiran Pulijala

I have tried hosting logstash service on ECS with Application load balancers, but filebeat is trying to reach load balancer dns, getting these errors:

{"log.level":"error","@timestamp":"2024-05-21T10:04:25.977+0530","log.logger":"publisher_pipeline_output","log.origin":{"":"pipeline/client_worker.go","file.line":174},"message":"failed to publish events: client is not connected","":"filebeat","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2024-05-21T10:04:25.977+0530","log.logger":"publisher_pipeline_output","log.origin":{"":"pipeline/client_worker.go","file.line":137},"message":"Connecting to backoff(async(tcp://","":"filebeat","ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2024-05-21T10:07:12.028+0530","log.logger":"logstash","log.origin":{"":"logstash/async.go","file.line":280},"message":"Failed to publish events caused by: lumberjack protocol error","":"filebeat","ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2024-05-21T10:07:12.028+0530","log.logger":"logstash","log.origin":{"":"logstash/async.go","file.line":280},"message":"Failed to publish events caused by: lumberjack protocol error","":"filebeat","ecs.version":"1.6.0"}

HI @Saikiran_Pulijala,

You can add load balancing in filebeat output hosts.

  hosts: ["localhost:5044", "localhost:5045"]
  loadbalance: true

Check more on logstash scalability.

1 Like

Generally, no. Typically your load balancer does not balance application requests, it balances connection requests.

This is not a trivial difference. Imagine you have 4 beats each load-balancing across the same two logstash instances. If you restart one of the logstash instances then it can take over a minute to get the JVM back up. In that time all of the beats may connect to the other logstash instance. You will have 4 beats all talking to one logstash, and one logstash idle, and the balancing architecture working as designed!

1 Like

Application Load Balancers like the AWS one normally only works for HTTP or HTTPS, beats does not use HTTP or HTTPS, it uses a proprietary protocol over TCP, so you need a Network Load Balancer, not an Application Load Balancer.

That's the reason for the errors you got.

You can create a network load balancer pointing to your logstash hosts, but you also need some settings in your logstash output in your filebeats.

Basically you need to add these settings:

pipelining: 0
loadbalance: false
ttl: 2m

Since you will have only one host in the output.logstash.hosts settings, loadbalance will be set to false, the ttl value is the amount of time that beats will try a new connection, this is required when you have logstash behind load balancers to avoid having uneven distributions as the beats connection to logstash is sticky, and pipelining is set to 0 to make the ttl option work.

The documentation has more information about those settings.

1 Like

@ashishtiwari1993 ,
Thanks for the quick response, In our use case we have load balancer DNS as common connection point to logstash container behind ALB.