hi all
Logstash 5.4.1, filebeat 5.4.1, ubuntu, development machines (no other traffic beyond the below)
I have seen a number of threads about this log message in filebeat (port 8998 is the logstash port), sometimes caused by misconfig or throttling etc.
2017/06/19 02:48:50.308090 sync.go:85: ERR Failed to publish events caused by: write tcp 10.1.0.12:56052->10.1.0.8:8998: write: connection reset by peer
2017/06/19 02:48:50.308121 single.go:91: INFO Error publishing events (retrying): write tcp 10.1.0.12:56052->10.1.0.8:8998: write: connection reset by peer
In my particular case logs are quite infrequent thus the default 60sec client inactivity timer often fires way before the 5m filebeat inactivity timer. Here is how it looks at the filebeat host, packet #8 is the RST from logstash which sets the stage for the above error log when the next log message is ready to be sent @ packet #12 (rest of the the transaction is snipped)
abc@host5:/home/abc$ tshark -i ethGi1 -n "tcp port 8998"
Capturing on 'ethGi1'
1 0.000000 10.1.0.12 -> 10.1.0.8 TCP 74 56052 > 8998 [SYN] Seq=0 Win=29200 Len=0 MSS=1460 SACK_PERM=1 TSval=1651206007 TSecr=0 WS=128
2 0.000164 10.1.0.8 -> 10.1.0.12 TCP 74 8998 > 56052 [SYN, ACK] Seq=0 Ack=1 Win=28960 Len=0 MSS=1460 SACK_PERM=1 TSval=264028982 TSecr=1651206007 WS=128
3 0.000207 10.1.0.12 -> 10.1.0.8 TCP 66 56052 > 8998 [ACK] Seq=1 Ack=1 Win=29312 Len=0 TSval=1651206007 TSecr=264028982
4 0.000839 10.1.0.12 -> 10.1.0.8 TCP 594 56052 > 8998 [PSH, ACK] Seq=1 Ack=1 Win=29312 Len=528 TSval=1651206007 TSecr=264028982
5 0.000952 10.1.0.8 -> 10.1.0.12 TCP 66 8998 > 56052 [ACK] Seq=1 Ack=529 Win=30080 Len=0 TSval=264028982 TSecr=1651206007
6 0.106094 10.1.0.8 -> 10.1.0.12 TCP 72 8998 > 56052 [PSH, ACK] Seq=1 Ack=529 Win=30080 Len=6 TSval=264029008 TSecr=1651206007
7 0.106121 10.1.0.12 -> 10.1.0.8 TCP 66 56052 > 8998 [ACK] Seq=529 Ack=7 Win=29312 Len=0 TSval=1651206033 TSecr=264029008
8 60.057794 10.1.0.8 -> 10.1.0.12 TCP 66 8998 > 56052 [RST, ACK] Seq=7 Ack=529 Win=30080 Len=0 TSval=264043996 TSecr=1651206033
9 116.003717 10.1.0.12 -> 10.1.0.8 TCP 74 56130 > 8998 [SYN] Seq=0 Win=29200 Len=0 MSS=1460 SACK_PERM=1 TSval=1651235008 TSecr=0 WS=128
10 116.003884 10.1.0.8 -> 10.1.0.12 TCP 74 8998 > 56130 [SYN, ACK] Seq=0 Ack=1 Win=28960 Len=0 MSS=1460 SACK_PERM=1 TSval=264057983 TSecr=1651235008 WS=128
11 116.003929 10.1.0.12 -> 10.1.0.8 TCP 66 56130 > 8998 [ACK] Seq=1 Ack=1 Win=29312 Len=0 TSval=1651235008 TSecr=264057983
12 116.005091 10.1.0.12 -> 10.1.0.8 TCP 508 56130 > 8998 [PSH, ACK] Seq=1 Ack=1 Win=29312 Len=442 TSval=1651235008 TSecr=264057983
(filebeat running in -once mode, timeout 5 mins (default), a single 'new' log added to a watched file before starting filebeat, wait for the RST from logstash, then cat another single log line into the watched file, Logstash set to debug to stdout only)
I understand the behaviour of the client_inactivity_timeout, but why is this sent as a RST rather than a FIN - which would allow for a graceful teardown of the filebeat <> logstash connection and prevent the connection reset by peer error message.
thanks
-jeff