Filebeat does not resume logstash connections

Hello,

I use filebeat 5.6 with logstash output (logstash is 5.2.2), using loadbalance: true et pipelining. Sometimes, there are disconnections between filebeats and logstash and I see messages like

2017-12-05T07:04:58+01:00 ERR Failed to publish events (host: elkprod-netflow-logstash-tc2-3.priv.sewan.fr:10010) caused by: write tcp 172.16.68.7:56516->172.16.64.90:10010: write: connection reset by peer

The main issue here is that after this message, filebeat stops to send anything to logstash until I restart filebeat:

2017-12-05T07:04:58+01:00 ERR Failed to publish events (host: elkprod-netflow-logstash-tc2-3.priv.sewan.fr:10010) caused by: write tcp 172.16.68.7:56516->172.16.64.90:10010: write: connection reset by peer
2017-12-05T07:05:01+01:00 INFO Non-zero metrics in the last 30s: filebeat.harvester.open_files=2 filebeat.harvester.running=2 filebeat.harvester.started=2 libbeat.logstash.call_count.PublishEvents=19 libbeat.logstash.publish.read_bytes=96 libbeat.logstash.publish.write_bytes=1755547 libbeat.logstash.publish.write_errors=2 libbeat.logstash.published_and_acked_events=32764 libbeat.logstash.published_but_not_acked_events=4096 libbeat.publisher.published_events=30716 publish.events=28672 registrar.states.current=2 registrar.states.update=28672 registrar.writes=14
2017-12-05T07:05:31+01:00 INFO Non-zero metrics in the last 30s: libbeat.logstash.call_count.PublishEvents=65 libbeat.logstash.publish.read_bytes=390 libbeat.logstash.publish.write_bytes=6763371 libbeat.logstash.published_and_acked_events=133120 libbeat.publisher.published_events=133120 publish.events=133120 registrar.states.update=133120 registrar.writes=65
2017-12-05T07:06:01+01:00 INFO No non-zero metrics in the last 30s
2017-12-05T07:06:31+01:00 INFO No non-zero metrics in the last 30s
2017-12-05T07:07:01+01:00 INFO No non-zero metrics in the last 30s
2017-12-05T07:07:31+01:00 INFO No non-zero metrics in the last 30s
2017-12-05T07:08:01+01:00 INFO No non-zero metrics in the last 30s
2017-12-05T07:08:31+01:00 INFO No non-zero metrics in the last 30s
2017-12-05T07:09:01+01:00 INFO No non-zero metrics in the last 30s

I tried to upgrade filebeat (I noticed this bug with 5.2, first, now I run 5.6). I have seen Filebeat doesn't resume transmission to logstash after connection interruption which talk of the same kind of issue but it's marked as fixed.

Here is my filebeat config:

filebeat.config_dir: /etc/filebeat/conf.d
filebeat.shutdown_timeout: 3000s
logging.files: {keepfiles: 24, name: filebeat, path: /var/log/filebeat, rotateeverybytes: 10485760}
logging.level: info
logging.metrics.enabled: true
logging.metrics.period: 30s
logging.to_files: true
filebeat.spool_size: 4096
output.logstash:
  hosts: ["host1.company.fr", "host2.company.fr"]
  port: 10010
  loadbalance: true
  pipelining: 100
  worker: 40

Is there a way to relaunch automatically the harvesting/processing of data ? Should I disable pipelining ?

Note : the "disconnection" issue is not important for us if filebeat does not need a restart after, but I noticed that disconnections happen when there is not a lot of data to send (not every 30s), at night or during weekends

Thank you,
Regards,
Grégoire

The pipelining setting can lead to a deadlock. It's a known bug recently fixed in 5.x branch (to be released with 5.6.5 I think).

Given the spool size and the default batch size of 2048, you will have at most 2 batches active in the output. That is only 2 out of the 80 workers will process events. The other 78 outputs will be idle most of the time. The effect of the very large pipelining setting is mostly disabling the slow-start in the output. With most recent 5.6.4 release, the slow start is disabled by default, so you don't need to set pipelining anymore.

The filebeat publishing pipeline in 6.0 has been completely rewritten to function fully asynchronous. Plus slow start is disabled and pipelining set to 5 by default. We've seen throughput improvements with 6.0. Maybe you want to give this a try.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.