Io timeout using filebeat and logstash

lishunan246 · October 3, 2017, 9:09am

I am using filebeat 5.5.1 and logstash 5.5.1 on Debian 8 to collect logs and send them to InfluxDB.
Logstash and filebeat are installed as a whole on 20 servers forwarding logs to a single InfluxDB server.
Logstash-input-beat logstash-output-influxdb and some filter plugins written by myself are used.
They will work fine for a long time , after one month or so things will go wrong.
Logstash writes no warning or error logs. Filebeat log goes like this:

2017-10-03T16:31:17+08:00 INFO Non-zero metrics in the last 30s: libbeat.logstash.publish.read_errors=1 libbeat.logstash.published_but_not_acked_events=4096
2017-10-03T16:31:47+08:00 INFO No non-zero metrics in the last 30s
2017-10-03T16:32:17+08:00 INFO Non-zero metrics in the last 30s: libbeat.logstash.call_count.PublishEvents=1 libbeat.logstash.publish.write_bytes=774
2017-10-03T16:32:19+08:00 ERR Failed to publish events caused by: read tcp 127.0.0.1:28184->127.0.0.1:5044: i/o timeout
2017-10-03T16:32:19+08:00 INFO Error publishing events (retrying): read tcp 127.0.0.1:28184->127.0.0.1:5044: i/o timeout
2017-10-03T16:32:47+08:00 INFO Non-zero metrics in the last 30s: libbeat.logstash.publish.read_errors=1 libbeat.logstash.published_but_not_acked_events=4096
2017-10-03T16:33:17+08:00 INFO No non-zero metrics in the last 30s
2017-10-03T16:33:47+08:00 INFO Non-zero metrics in the last 30s: libbeat.logstash.call_count.PublishEvents=1 libbeat.logstash.publish.write_bytes=769
2017-10-03T16:33:49+08:00 ERR Failed to publish events caused by: read tcp 127.0.0.1:28240->127.0.0.1:5044: i/o timeout
2017-10-03T16:33:49+08:00 INFO Error publishing events (retrying): read tcp 127.0.0.1:28240->127.0.0.1:5044: i/o timeout
2017-10-03T16:34:17+08:00 INFO Non-zero metrics in the last 30s: libbeat.logstash.publish.read_errors=1 libbeat.logstash.published_but_not_acked_events=4096

I set the jvm heap size to 4gb for logstash. The gcutil and top of logstash is like:

  S0     S1     E      O      M     CCS    YGC     YGCT    FGC    FGCT     GCT   
  0.00   2.72  33.98  29.74  91.43  85.72 249381 5888.720    82    4.375 5893.095

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                                                                                                                                 
 48725 root      20   0 22.159g 4.265g   8116 S   2.7  3.4  20702:43 java

Restart filebeat won't solve the problem. SIGTERM can't stop logstash. So I have to use SIGKILL and then restart logstash. After this things can work for several weeks but will eventually go wrong again.
Does someone have any idea about what is wrong? Is the heap size too small or is there memory leak somewhere
?

system · October 31, 2017, 9:10am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Filebeat to Logstash I/O Timeouts through AWS ELB Beats filebeat	6	3795	November 1, 2017
ERR Failed to publish events caused by: read tcp X.XXXX:55860->XXXXXXX:5044: i/o timeout Logstash	8	2515	August 30, 2018
Filebeat throwing i/o timeout while sending logs to logstash Beats filebeat	11	13505	May 18, 2018
"Failed to publish events...i/o timeout" (but telnet is OK) Beats filebeat	2	546	October 31, 2019
Failed to publish events caused by: i/o timeout while sending logs to logstash Logstash	1	304	March 11, 2020

Io timeout using filebeat and logstash

Related topics