We are running Filebeat via a daemonset in Kubernetes. We are running logstash as a statefulset with persistent queues and 12 nodes. Filebeat is configured with the hostname to each logstash node and loadbalance is set to true.
During normal operation, everything works just fine. There is no backpressure and the logstash queues don't fill up. However, if enough backpressure is given such that the logstash queues fill up, we see some strange behavior.
Upon monitoring the logstash queues, all of them except for one will eventually drain. That one will remain full or near full for long time until the backpressure in Filebeat is relieved enough such that the input volume to that logstash node is less than its output volume.
During this time, there will be small bursts of messages that get sent through the other 11 logstash nodes but they all quickly drain, while the 1 node's queue remains near full. The 1 problem node is indeed processing its queue but its recieving more messages as well.
We are perplexed by this as with loadbalancing enabled in Filebeat, one would expect the distribution to be equal.
We have seen this multiple times but most recently when our ElasticSearch cluster went down and logstash backed up. Once we got ES working again, we saw this same behavior. All of the logstash nodes drained their queues except one which seemed to keep recieving events as fast as it was outputting them.
Anyone have any ideas?