We recently dropped Redis from our ELK pipeline and started sending logs directly using the beats input from Filebeat (6.1.3) to two Logstash instances(5.6.2) with persistent queues and loadbalance set to true. However we see such errors in the Filebeat log from time to time:
2019-01-08T15:17:14+01:00 ERR Failed to publish events caused by: write tcp 10.21.1.35:60596>10.10.10.180:5044: write: connection reset by peer
2019-01-08T15:17:15+01:00 ERR Failed to publish events: write tcp 10.21.1.35:60596->10.10.10.180:5044: write: connection reset by peer
2019-01-08T15:17:24+01:00 ERR Failed to publish events caused by: write tcp 10.21.1.35:43316->10.10.10.179:5044: write: connection reset by peer
2019-01-08T15:17:25+01:00 ERR Failed to publish events: write tcp 10.21.1.35:43316->10.10.10.179:5044: write: connection reset by peer
2019-01-08T15:17:26+01:00 ERR Failed to connect: dial tcp 10.10.10.180:5044: getsockopt: connection refused
2019-01-08T15:17:27+01:00 ERR Failed to connect: dial tcp 10.10.10.179:5044: getsockopt: connection refused
2019-01-08T15:17:30+01:00 ERR Failed to connect: dial tcp 10.10.10.180:5044: getsockopt: connection refused
2019-01-08T15:17:31+01:00 ERR Failed to connect: dial tcp 10.10.10.179:5044: getsockopt: connection refused
Otherwise the events are coming in as expected in Elasticsearch (5.6.2) so I assume the error is caused by Logstash applying back pressure to Filebeat? Is this correct?
Setting "client_inactivity_timeout => 240" on the beats input doesn't seems to make big difference.