Filebeat clients unable to publish events to logstash

Hm.... the bundled version setup own user&group logstash and the service. AFAIK other then that, should be the same.

Can you try to set client_inactivity_timeout? Check this topic.

Edit: Just want to suggest, please increase and test.

1 Like

Will do .. tho I'm not sure what's the best value here. I think what tricked me here into not looking into the client_inactivity_timeout option before is that the connection reset by peer message only appears on a new message send and not right after the connection has been reset by logstash. I thought that filebeat would know about the connection reset before attempting another send to logstash but looks like I'm in the wrong here haha.

My logs can sometimes populate new messages on a per second basis but also sometimes have no new messages for a few hours up to a few days at most. So I think as long as I'm receiving log messages I can ignore the reset by peer messages as it's an expected behaviour.

@Rios @stephenb @RainTown you mentioned that the ttl on the filebeat logstash output is currently not supported when the pipelining option is used. How could I utilize the ttl option but at the same time ensure that messages are send in their correct order as they appear in the logs? I was hoping I could use pipelining 1 to ensure there's only one batch of events send at once. Any suggestions?

You have just one Logstash instance right?

In this case I don't think that setting ttl and pipelining will make any difference here.

Those settings are most used when you have multiple logstash instances or a load balancer in front of your logstash.

I believe that you should remove both from the configuration and use the defaults.

Correct. At the moment I only have one logstash instance but I might end up adding 1 or 2 down the line if needed.

The "ttl" option is not yet supported on an async Logstash client (one with the "pipelining" option set). - from the documentation

I was able to "reproduce" the Connection reset messages, but without it causing an issue. also used Rocky Linux 9.5. Firewall on/off did not appear to make any difference.

The slightly strange thing is the 2 sides disagree on keepalive timer status. I do see the TCP keepalive packets (and ACKs) with tcodump on both hosts. But logstash end seems to ignore them. And sometime after 3x 15 second countdowns I see at filebeat end, it (logstash) sends a RSET

The filebeat end shows this in netstat output:

$ sudo netstat -no | fgrep EST | fgrep 5044
tcp        0      0 192.168.178.67:45842    192.168.178.66:5044     ESTABLISHED keepalive (0.73/0/0)

But logstash end show this: (note the "off")

$ sudo netstat -no | fgrep EST | fgrep 5044
tcp6       0      0 192.168.178.66:5044     192.168.178.67:45842    ESTABLISHED off (0.00/0/0)

As long as the "pipe" is kept busy it's a non issue. I didn't see any lost logs, tested for just a little while of course.

If it's not busy, sure the RSET happens, but a new TCP connection is setup when required.

Below is the Wireshark view, the .66 is logstash and the .67 is filebeat (for the eagle eyed, my Wireshark source port / destination port columns are the wrong way round!)

The packet with length 754 is the actual log being sent from filebeat to logstash, the only one in that time window.

network experts might wish to weigh in.

3 Likes

That's interesting and sort of confirms what I'm seeing too on not so busy logs where connections do reset. Thanks for taking the time to reproduce this on your end. I think I chased the wrong end at first cause I somehow was expecting the connection reset messages to be thrown when the connection is actually being reset by logstash and not on new message send attempts.

Adjusting the output.logstash.timeout to a slightly higher value has fixed the java.net.SocketException: Connection reset I saw on my remote filebeat nodes and I have logstash now receiving events successfully.

I am glad you found a fix, but maybe my wording was not clear (t was late!) so let me rephrase.

What I found was that a TCP connection was setup, initiated by filebeat, some data transferred and ACKed, filebeat also sent a 6 bytes of "data" inside another TCP packet which always contained the same data - 32 41 00 00 00 01 (those are hex) - which was also ACKed by logstash.

Then, assuming no more logs to send, filebeat sent 3 keepalives, around 15 seconds apart, which logstash ACK-ed but seemed to just ignore (no timer showing on the netstat output). And the TCP session was RST by logstash almost precisely 60 seconds after the last "real data" was exchanged.

Note this 60 second had (in my tests) nothing to do with the timeout: 60 setting under filebeat's output.logstash section, I changed to both less than AND more than 60 and it made no difference for me - RST still came 60s after last "real data" packet.

As soon as new data was to be sent to logstash a new TCP session is established, using different TCP port at filebeat end and obviously 5044 at logstash end.

1 Like

Gotcha. Thanks for the more detailed explanation. I did do a tcpdump as well but am not that good with Wireshark and analyzing what's going on exactly so I appreciate your post! I'll try setting the client_inactivity_timeout value to something higher and lower then the default and see if that matches when the RST comes up.

To anyone interested the below is my working config now.

conf.d/to_confluent.conf

input {
  beats {
    port => 5044
    client_inactivity_timeout => 300
  }
}

filebeat.yml

filebeat.inputs:
- type: filestream
  id: access-logs
  enabled: true
  paths:
    - /mnt/*/log/access.log

  prospector.scanner.fingerprint:
    enabled: true
    offset: 0
    length: 64
  file_identity.fingerprint: ~
  clean_removed: false

processors:
  - add_host_metadata:
      when.not.contains.tags: forwarded

output.logstash:
  hosts: ["xxx.xxx.xxx.xxx:5044"]
  bulk_max_size: 1
  timeout: 300
  pipelining: 0
2 Likes