KAFKA server keeping too much TCP ESTABLISHED connections with Logstash Windows Client

Hi folks,
My kafka server connect a bunch of logstash shippers, which running on Windows platform.
We have detected Kafka TCP established connections over 30k alarms these days, it seems like Kafka server keeping too much useless TCP connections which have been disconnected by logstash-shipper.

For instance:
Checked on Windows Logstash Shipper(10.10.90.63), there were 62 TCP ESTABLISHED connections with Kafka Server(10.10.50.86:9092):
>> netstat -ano|findstr 10.10.50.86
TCP 10.10.90.63:49595 10.10.50.86:9092 ESTABLISHED 4216
TCP 10.10.90.63:50692 10.10.50.86:9092 ESTABLISHED 4216
...
(Total 62 connections)

Checked on Kafka Server, there were 4743 TCP ESTABLISHED connections with 10.10.90.63:

#netstat -anlp|grep '10.10.50.86:9092'|awk '{print $(NF-1),$5}'|grep 10.10.90.63|awk '{print $1}'|sort |uniq -c
   4743 ESTABLISHED

Manually stop logstash service at 10.10.90.63, TCP status change to TIME_WAIT first, and all gone at last, Kafka server reduce 62 connections(4743-62=4681), which mean kafka keeping 4681 useless TCP connections:
>> netstat -ano|findstr 10.10.50.86
TCP 10.10.90.63:49595 10.10.50.86:9092 TIME_WAIT 0
TCP 10.10.90.63:50692 10.10.50.86:9092 TIME_WAIT 0
...

#netstat -anlp|grep '10.10.50.86:9092'|awk '{print $(NF-1),$5}'|grep 10.10.90.63|awk '{print $1}'|sort |uniq -c
   4681 ESTABLISHED

I have mitigated this issue by restart kafka service, but I think it shouldn't happen on production server, not sure anyone face similar problem.

  • Kafka TCP connection status in last months:

  • Manually stop Kafka Service:

Pasting kafka and logstash version info as below, let me know if anyone need further info, thanks.

Package Versions:
Kafka: kafka_2.10-0.8.2.1
Kafka OS: CentOS 6.5/ 2.6.32-431.el6.x86_64
Logstash-Shipper: logstash-2.3.1
Logstash-Shipper OS: Windows Server 2008 R2

CentOS6.5 sysinfo:

#cat /etc/sysctl.conf | grep tcp
net.ipv4.tcp_max_tw_buckets = 20000
net.ipv4.tcp_sack = 1
net.ipv4.tcp_window_scaling = 1
net.ipv4.tcp_rmem = 4096 87380 4194304
net.ipv4.tcp_wmem = 4096 16384 4194304
net.ipv4.tcp_max_orphans = 3276800
net.ipv4.tcp_max_syn_backlog = 262144
net.ipv4.tcp_timestamps = 0
net.ipv4.tcp_synack_retries = 1
net.ipv4.tcp_syn_retries = 1
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_mem = 94500000 915000000 927000000
net.ipv4.tcp_fin_timeout = 30
net.ipv4.tcp_keepalive_time = 1200

Logstash Configuration:

output {
  if [type] == "IIS" {
    #stdout { codec => rubydebug }
    kafka {
      bootstrap_servers => "10.10.50.57:9092"
	  topic_id => "iis"
    }
  }
  else if [type] == "win-log" {
    #stdout { codec => rubydebug }
    kafka {
      bootstrap_servers => "10.10.50.86:9092,10.10.50.87:9092,10.10.50.88:9092"
	  topic_id => "oslog"
    }
  }
}