Hm.... the bundled version setup own user&group logstash and the service. AFAIK other then that, should be the same.
Can you try to set client_inactivity_timeout? Check this topic.
Edit: Just want to suggest, please increase and test.
Hm.... the bundled version setup own user&group logstash and the service. AFAIK other then that, should be the same.
Can you try to set client_inactivity_timeout? Check this topic.
Edit: Just want to suggest, please increase and test.
Will do .. tho I'm not sure what's the best value here. I think what tricked me here into not looking into the client_inactivity_timeout
option before is that the connection reset by peer
message only appears on a new message send and not right after the connection has been reset by logstash. I thought that filebeat would know about the connection reset before attempting another send to logstash but looks like I'm in the wrong here haha.
My logs can sometimes populate new messages on a per second basis but also sometimes have no new messages for a few hours up to a few days at most. So I think as long as I'm receiving log messages I can ignore the reset by peer messages as it's an expected behaviour.
@Rios @stephenb @RainTown you mentioned that the ttl
on the filebeat logstash output is currently not supported when the pipelining
option is used. How could I utilize the ttl
option but at the same time ensure that messages are send in their correct order as they appear in the logs? I was hoping I could use pipelining 1
to ensure there's only one batch of events send at once. Any suggestions?
You have just one Logstash instance right?
In this case I don't think that setting ttl
and pipelining
will make any difference here.
Those settings are most used when you have multiple logstash instances or a load balancer in front of your logstash.
I believe that you should remove both from the configuration and use the defaults.
Correct. At the moment I only have one logstash instance but I might end up adding 1 or 2 down the line if needed.
The "ttl" option is not yet supported on an async Logstash client (one with the "pipelining" option set). - from the documentation
I was able to "reproduce" the Connection reset
messages, but without it causing an issue. also used Rocky Linux 9.5. Firewall on/off did not appear to make any difference.
The slightly strange thing is the 2 sides disagree on keepalive timer status. I do see the TCP keepalive packets (and ACKs) with tcodump on both hosts. But logstash end seems to ignore them. And sometime after 3x 15 second countdowns I see at filebeat end, it (logstash) sends a RSET
The filebeat end shows this in netstat output:
$ sudo netstat -no | fgrep EST | fgrep 5044
tcp 0 0 192.168.178.67:45842 192.168.178.66:5044 ESTABLISHED keepalive (0.73/0/0)
But logstash end show this: (note the "off")
$ sudo netstat -no | fgrep EST | fgrep 5044
tcp6 0 0 192.168.178.66:5044 192.168.178.67:45842 ESTABLISHED off (0.00/0/0)
As long as the "pipe" is kept busy it's a non issue. I didn't see any lost logs, tested for just a little while of course.
If it's not busy, sure the RSET happens, but a new TCP connection is setup when required.
Below is the Wireshark view, the .66 is logstash and the .67 is filebeat (for the eagle eyed, my Wireshark source port / destination port columns are the wrong way round!)
The packet with length 754 is the actual log being sent from filebeat to logstash, the only one in that time window.
network experts might wish to weigh in.
That's interesting and sort of confirms what I'm seeing too on not so busy logs where connections do reset. Thanks for taking the time to reproduce this on your end. I think I chased the wrong end at first cause I somehow was expecting the connection reset
messages to be thrown when the connection is actually being reset by logstash and not on new message send attempts.
Adjusting the output.logstash.timeout to a slightly higher value has fixed the java.net.SocketException: Connection reset
I saw on my remote filebeat nodes and I have logstash now receiving events successfully.
I am glad you found a fix, but maybe my wording was not clear (t was late!) so let me rephrase.
What I found was that a TCP connection was setup, initiated by filebeat, some data transferred and ACKed, filebeat also sent a 6 bytes of "data" inside another TCP packet which always contained the same data - 32 41 00 00 00 01 (those are hex) - which was also ACKed by logstash.
Then, assuming no more logs to send, filebeat sent 3 keepalives, around 15 seconds apart, which logstash ACK-ed but seemed to just ignore (no timer showing on the netstat output). And the TCP session was RST by logstash almost precisely 60 seconds after the last "real data" was exchanged.
Note this 60 second had (in my tests) nothing to do with the timeout: 60
setting under filebeat's output.logstash section, I changed to both less than AND more than 60 and it made no difference for me - RST still came 60s after last "real data" packet.
As soon as new data was to be sent to logstash a new TCP session is established, using different TCP port at filebeat end and obviously 5044 at logstash end.
Gotcha. Thanks for the more detailed explanation. I did do a tcpdump as well but am not that good with Wireshark and analyzing what's going on exactly so I appreciate your post! I'll try setting the client_inactivity_timeout
value to something higher and lower then the default and see if that matches when the RST comes up.
To anyone interested the below is my working config now.
conf.d/to_confluent.conf
input {
beats {
port => 5044
client_inactivity_timeout => 300
}
}
filebeat.yml
filebeat.inputs:
- type: filestream
id: access-logs
enabled: true
paths:
- /mnt/*/log/access.log
prospector.scanner.fingerprint:
enabled: true
offset: 0
length: 64
file_identity.fingerprint: ~
clean_removed: false
processors:
- add_host_metadata:
when.not.contains.tags: forwarded
output.logstash:
hosts: ["xxx.xxx.xxx.xxx:5044"]
bulk_max_size: 1
timeout: 300
pipelining: 0
© 2020. All Rights Reserved - Elasticsearch
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant logo are trademarks of the Apache Software Foundation in the United States and/or other countries.