It can work normally for many days, but randomly it begins to delay messages.
Just look at messages count chart:
y-axis is messages per second. Time is from logs time (not message receive time). This holes will be evenly filled by data after some time.
And more time it works - more delays accumulated (longer tail with holes)
After filebeat restart it sends all delayed messages as fast as possible and everything works fine... for some time.
Also there is strange statistic logged by filebeat when this delays happens:
INFO Non-zero metrics in the last 30s: filebeat.harvester.open_files=5 filebeat.harvester.running=5 filebeat.harvester.started=5 libbeat.logstash.call_count.PublishEvents=39 libbeat.logstash.publish.read_bytes=12320 libbeat.logstash.publish.write_bytes=1731451 libbeat.logstash.published_and_acked_events=19968 libbeat.publisher.published_events=16380 publish.events=16384 registrar.states.update=16384 registrar.writes=2
publish.events=16384 does not match libbeat.logstash.published_and_acked_events.
I use Filebeat 5.3.0 for Windows 32-bit (on 64-bit machine)
It seems I've found a source of problem.
Is is Logstash output's logic of window size detection.
It will grow maxWindowSize until send failed and then stuck on that maxWindowSize. So window size never can get greater after first send failure.
In rare conditions of unstable internet max window size can stop at min value (10 in my case). With such a low window szie filebeat can't send all incoming messages anymore and queue start to grow.
Restart of filebeat resets maxWindowSize and it grow up normally to batchSize.
In my case I've made a fix to logstash output and always set maxWindowSize to the biggest batchSize (in tryGrowWindow() method of window.go). Now everything works fine.
I think that user must be able to configure maxWindowSize. And in case of filebeat it must be equal to batchSize.
Depending on logstash version, dynamic sizing is required, to not overload logstash and killing/resending events, as logstash is not answering anymore. The problem with the dynamic sizing is, it can not overcome the maxWindowSize right now. Potential solutions: 1) have window sizing probe bigger sizes (randomly), so output is not stuck on maxWindowSize.. 2) introduce a setting to disable dynamic window sizing. As dynamic windowing plus slow-start is not that critical with most recent versions of the logstash-input-beats plugin, I'd even disable windowing by default. Pull requests are very welcome.
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.