Filebeat only goes to one of the Logstash Servers that is behind an ELB

I have the following test scenario:
Filebeat --> AWS ELB --> (2) Logstash servers.
Both logstash servers have the exact same configuration.
The log output from a single Filebeat server only hit one of the Logstash servers behind the ELB.
I would expect it to round-robin between the two Logstash servers that are behind the ELB.
I am using TCP for my port on the ELB so there is no session affinity.
Is there a trick to getting the round-robin to happen.
Please advise.
Regards,
Gary

Isn't this an ELB problem, not an LS/Beats one?

If you use a load balancer exposing a single IP, I believe Filebeat will assume you are trying to connect to a single Logstash instance. Filebeat uses persistent connections so will only reconnect once the connection is broken. If you instead provide the hosts directly in the Filebeat config, you can configure it to load balance between the specified instances.

1 Like

Thanks for the replies. I appreciate the your time in responding to my questions. I know this looks and smells like an ELB issue but I have a number of other applications sending traffic to ELBs and none of those have this issue. They all round-robin between the n number of backend servers that are behind the ELB. I want to use an ELB to avoid having to hard-code the IPs of the target Logstash servers and to use AWS's auto-scaling capabilities. I put the question to the forum to see if there was something special that needed to be done to use Filebeat with an ELB.

might be a stupid question, but did you configure both logstash servers i the hosts and set loadbalance: true, such as:
hosts: ["logstash01:5044", "logstash02.corp.pvt:5044"]
loadbalance: true

loadbalance defaults to off, and will only send to a single logstash instance unless failed over. Take logstash01 down to see if it goes to 02. if so, probably this. otherwise i likely misunderstood your question.

ELB works by assigning another server per connection made (I think based on DNS). That is, it only works well with non-persistent connections. That is, for ELB to properly load-balance you require your application to reconnect every so often. In most extreme cases one connection per log-line. This mode of operation increases latency, reduces throughput and adds additional packets/bytes to be transmitted in network, which makes this a quite expensive but flexible solution.

Beats->logstash is using a persistent TCP connection. In order to work 'well' with ELB, you will need multiple beats-instances + some method to break connections every so often (force filebeat to reconnect). Related ticket: https://github.com/elastic/beats/issues/661

Only advantage of beats dropping connections after TTL is, they will reconnect and 'rebalance' based current ELB state. But once connected, beats will send to assigned logstash only. load-balancing will be pretty sub-optimal. One can try to improve by configuring worker=X to spawn multiple workers each connection to logstash via ELB.

Personally I'd love to see some cluster-state-whatever-support (e.g. like elasticsearch sniffing support by some clients) for beats to auto-discover logstash nodes being added/removed and dynamically update the load-balancer. But this is something for far future.

Rather than beats sniffing for auto-discovery a simplified model would be to have an option in filebeat to either use a persistent connection or to re-establish a connection for each parsed log line. While there is overhead associated with the second model it more simulates a web traffic scenario in that each request, or in this case each parsed log line (single or multi), would be a separate request from filebeat. This would greatly improve the use of data sent from filebeat for a load-balanced model. Filebeat would not have to keep track of the downstream logstash systems, it would send and forget. The load balancer, in the case of AWS, would be able to auto-scale based on load adding as many, or reducing as many, backend logstash servers as needed as load demands change.

This is related to issue #661. See link in my former post. Only problem with this is, filebeat waits for ACK from logstash before sending another batch events. Worse, in case we implement pipelining it get's a lot more difficult to decide when to drop a connection.

Resolved. A new filebeat version came out that supports Redis. I created an AWS ElastiCache Redis cluster to accept the filebeat input and to face logstash. I moved multiline parsing to the filebeat tier and out of the logstash tier because multiline parsing causes logstash to go single threaded. Plus it is better to do multiline parsing as close to the input data as possible. Thanks for the recommendations @steffens. I used a 1.5M+ test record set containing 10 different timestamp formats and get 100% match results. I now have the following configuration: filebeat (many instances) --> ElastiCacheRedis <-- logstash (3) --> elasticsearch cluster (3) <-- Kibana. The new filebeat functionality works great!

2 Likes