Hello,
I use filebeat 5.6 with logstash output (logstash is 5.2.2), using loadbalance: true et pipelining. Sometimes, there are disconnections between filebeats and logstash and I see messages like
2017-12-05T07:04:58+01:00 ERR Failed to publish events (host: elkprod-netflow-logstash-tc2-3.priv.sewan.fr:10010) caused by: write tcp 172.16.68.7:56516->172.16.64.90:10010: write: connection reset by peer
The main issue here is that after this message, filebeat stops to send anything to logstash until I restart filebeat:
2017-12-05T07:04:58+01:00 ERR Failed to publish events (host: elkprod-netflow-logstash-tc2-3.priv.sewan.fr:10010) caused by: write tcp 172.16.68.7:56516->172.16.64.90:10010: write: connection reset by peer
2017-12-05T07:05:01+01:00 INFO Non-zero metrics in the last 30s: filebeat.harvester.open_files=2 filebeat.harvester.running=2 filebeat.harvester.started=2 libbeat.logstash.call_count.PublishEvents=19 libbeat.logstash.publish.read_bytes=96 libbeat.logstash.publish.write_bytes=1755547 libbeat.logstash.publish.write_errors=2 libbeat.logstash.published_and_acked_events=32764 libbeat.logstash.published_but_not_acked_events=4096 libbeat.publisher.published_events=30716 publish.events=28672 registrar.states.current=2 registrar.states.update=28672 registrar.writes=14
2017-12-05T07:05:31+01:00 INFO Non-zero metrics in the last 30s: libbeat.logstash.call_count.PublishEvents=65 libbeat.logstash.publish.read_bytes=390 libbeat.logstash.publish.write_bytes=6763371 libbeat.logstash.published_and_acked_events=133120 libbeat.publisher.published_events=133120 publish.events=133120 registrar.states.update=133120 registrar.writes=65
2017-12-05T07:06:01+01:00 INFO No non-zero metrics in the last 30s
2017-12-05T07:06:31+01:00 INFO No non-zero metrics in the last 30s
2017-12-05T07:07:01+01:00 INFO No non-zero metrics in the last 30s
2017-12-05T07:07:31+01:00 INFO No non-zero metrics in the last 30s
2017-12-05T07:08:01+01:00 INFO No non-zero metrics in the last 30s
2017-12-05T07:08:31+01:00 INFO No non-zero metrics in the last 30s
2017-12-05T07:09:01+01:00 INFO No non-zero metrics in the last 30s
I tried to upgrade filebeat (I noticed this bug with 5.2, first, now I run 5.6). I have seen Filebeat doesn't resume transmission to logstash after connection interruption which talk of the same kind of issue but it's marked as fixed.
Here is my filebeat config:
filebeat.config_dir: /etc/filebeat/conf.d
filebeat.shutdown_timeout: 3000s
logging.files: {keepfiles: 24, name: filebeat, path: /var/log/filebeat, rotateeverybytes: 10485760}
logging.level: info
logging.metrics.enabled: true
logging.metrics.period: 30s
logging.to_files: true
filebeat.spool_size: 4096
output.logstash:
hosts: ["host1.company.fr", "host2.company.fr"]
port: 10010
loadbalance: true
pipelining: 100
worker: 40
Is there a way to relaunch automatically the harvesting/processing of data ? Should I disable pipelining ?
Note : the "disconnection" issue is not important for us if filebeat does not need a restart after, but I noticed that disconnections happen when there is not a lot of data to send (not every 30s), at night or during weekends
Thank you,
Regards,
Grégoire