We're running a set of Auditbeat agents that forward logging to a Logstash instance. Whenever we restart the Logstash instance, some of our Auditbeat agents lose their connection and are unable to reconnect. Curiously, they say that they are able to reconnect in the logs:
Feb 26 16:19:26 auditbeat: 2020-02-26T16:19:26.028Z ERROR logstash/async.go:256 Failed to publish events caused by: client is not connected Feb 26 16:19:27 auditbeat: 2020-02-26T16:19:27.043Z ERROR pipeline/output.go:121 Failed to publish events: client is not connected Feb 26 16:19:27 auditbeat: 2020-02-26T16:19:27.043Z INFO pipeline/output.go:95 Connecting to backoff(async(tcp://10.1.1.2:5000)) Feb 26 16:19:30 auditbeat: 2020-02-26T16:19:30.549Z INFO pipeline/output.go:105 Connection to backoff(async(tcp://10.1.1.2:5000)) established Feb 26 16:19:30 auditbeat: 2020-02-26T16:19:30.559Z INFO [publisher] pipeline/retry.go:196 retryer: send unwait-signal to consumer Feb 26 16:19:30 auditbeat: 2020-02-26T16:19:30.559Z INFO [publisher] pipeline/retry.go:198 done
...but these logs are not visible in the Elasticsearch cluster that Logstash forwards to.
Restarting the Auditbeat agents fixes this problem, but it doesn't seem like a robust solution. Interestingly, Filebeat (which we are also running) is able to recover connections without an issue.
The version of the agent that I am running is:
auditbeat version 7.6.0 (amd64), libbeat 7.6.0 [6a23e8f8f30f5001ba344e4e54d8d9cb82cb107c built 2020-02-05 23:03:32 +0000 UTC
Has anyone spotted this behaviour before? This feels like a bug to me but I wanted to check before opening a Github issue.