FIlebeat dropping connection to logstash


(Sai Birada) #1

In order to try filebeat on production, i launched 1.1k instances of filebeat on my production boxes, each monitoring a couple of files in its boxes and sending data to 3 central logstash servers, which are pushing those data after some modifications and droppings to elasticsearch cluster. Since the day i launched i could see a pattern that every day some of the filebeat instances are dropping their connections to logstash. for example first day 1100 filebeats sending data to logstash, second day 1070, third day 1030, and with in a month it come to half now. All my filebeats are in freebsd boxes and i got the filebeat binary here https://beats-nightlies.s3.amazonaws.com/index.html?prefix=jenkins/filebeat/
I went to those boxes and observed in filebeat.log that following errors exist
016-07-18T03:31:00-07:00 ERR SSL client failed to connect with: read tcp 113.29.215.219:48835->52.36.210.197:443: read: connection reset by peer
2016-07-18T03:32:13-07:00 ERR SSL client failed to connect with: read tcp 113.29.215.219:59477->52.36.210.197:443: read: connection reset by peer
2016-07-18T03:33:43-07:00 ERR SSL client failed to connect with: read tcp 113.29.215.219:56951->52.36.210.197:443: i/o timeout
2016-07-18T03:35:14-07:00 ERR SSL client failed to connect with: read tcp 113.29.215.219:15389->52.36.210.197:443: i/o timeout
2016-07-18T03:36:44-07:00 ERR SSL client failed to connect with: read tcp 113.29.215.219:52429->52.36.210.197:443: i/o timeout.
This couldnt be any ssl issue or firewall issue as till a couple of days earlier this particular filebeat sent logs to logstash, and i tried connecting through curl and telnet and i can make connections to logstash. It seems to me that its somehow a filebeat issue . Following is my sample filebeat configuration file which is modified in every instance.

  1 filebeat:                                                                                                                                                                                               
  2   prospectors:
  3     -
  4       paths:
  5         - /sc/log/setcainfo.log
  6       fields:
  7         hostip: "localipaddress"
  8       document_type: setcainfo_Etc/GMT:timeadjust
  9 
 10     -
 11       paths:
 12         - /var/log/messages
 13       fields:
 14         hostip: "localipaddress"
 15       document_type: messages_Etc/GMT:timeadjust
 16 
 17     -
 18       paths:
 19         - /root/.bash_history
 20       fields:
 21         hostip: "localipaddress"
 22       document_type: bashhistory_Etc/GMT:timeadjust
 23 
 24 
 25 output:
 26   logstash:
 27     hosts: ["52.26.234.26:443"]
 28     tls:
 29        certificate_authorities: ["/sc/filebeat/logstash-forwarder.crt"]
 30 
 31 logging:
 32   to_syslog: false
 33   to_files: true
 34 
 35   files:
 36     path: /var/log/filebeat
 37     name: filebeat.log
 38     rotateeverybytes: 10485760
 39     keepfiles: 7
 40     level: debug

(ruflin) #2

Which filebeat version are you using?


(Sai Birada) #3

I am not sure about the version of the filebeat as i downloaded the binary from here
https://beats-nightlies.s3.amazonaws.com/index.html?prefix=jenkins/filebeat/
And its a bit older one as i downloaded it on march 17 2016. But it did work, and is working on the remaining instances.


(ruflin) #4

Ok, it is already good to know that you are on the nightly builds. The main problem is that since march a lot of things happened. Any chance on your side to update the machines to the most recent builds?

@steffens Can you also have a look here if you spot some potential issues here?


(Sai Birada) #5

Is there any known issue in filebeat causing this, which is fixed in recent versions. So that upgrading filebeat will definitely solve my issue. As this old version worked fine and is still working well on 500+ instances. Or is there some issue with my config file, am i missing some important configs.


(Steffen Siering) #6

I can't really tell what changed since march. There have been some improvements to all outputs, but don't remember any particular fix/issue regarding connections suddenly timing out. I'd say give more recent nightlies a try.

Next time this happens some network trace (via tcpdump) for said connection would be nice to have.


(system) #7

This topic was automatically closed after 21 days. New replies are no longer allowed.