Filebeat logstash communication

Hi,
I've took a look at

However I still have multiple questions regarding filebeat's logstash output.

  1. How many TCP connections are generated? from my testing even with multiple pipelines (the default) and setting up workers (say 4) I only got one TCP connection to logstash.

  2. Can I set up multiple same hosts to get multiple TCP connections to logstash?

  3. Is there a way to see in "monitoring" of backpressure or communication issue, specifically lumberjack's window size with logstash problems?

  4. harvester <-> output flow: does a harvester harvests as fast as it can and the output sends as fast as it can and there is some queuing mechanism between them? or does a harvester holds a file at some point until it gets a call, essentially when the output is ready, to harvest some more?

My core issue is a big (sometimes hours!, even days, but it doesn't always happen) timegap between filebeat's @timestamp and the messages' one, restarting filebeat immediately catches up within a few seconds of the backlog.
It could be some file read problem, but could it also be that logstash' acks causes filebeat to slow down and not read the files?

As experiment I've run two filebeat instances on the same log files, in one it happened (big gap) while on the other (same logs, same machine, same time) it so happened that it didn't.

The only, then, difference is different TCP connections
If it's that, could setting TTL to some number (force reconnection) mediate it? but then it mandates disabling pipelines (as async doesn't support TTL) so can I have multiple connections too with multiple (same) entries in "hosts:"?

1 Like