ERR Failed to publish events caused by: read tcp IP:40634->IP:5044: i/o timeout

Hello ,

I am seeing error --
ERR Failed to publish events caused by: read tcp
INFO Error publishing events (retrying): read tcp

filebeat.prospectors:
- input_type: log

  # Paths that should be crawled and fetched. Glob based paths.
  paths:
#    - /var/log/*.log
    - /archives/logs/tomcat7-8080/download.log
    - /archives/logs/tomcat7-8090/download.log
#    - /etc/filebeat/test/download.2017-06-16-1125-1925.log
#     - /etc/filebeat/test/download.2017-06-16-1125-1224.log
#filebeat.spool_size: 4096
tail_files: tru


output.logstash:
  # The Logstash hosts
  hosts: ["lvsyslogstash1.lv.jabodo.com:5044","lvsyslogstash2.lv.jabodo.com:5044"]
  loadbalance: true
  worker: 2
  filebeat.publish_async: true
#  bulk_max_size: 2048
 # Optional SSL. By default is off.
  # List of root certificates for HTTPS server verifications
  #ssl.certificate_authorities: ["/etc/pki/root/ca.pem"]

  # Certificate for SSL client authentication
  #ssl.certificate: "/etc/pki/client/cert.pem"

  # Client Certificate Key
  #ssl.key: "/etc/pki/client/cert.key"

I am not sure how to use - bulk_max_size: and filebeat.spool_size: i have commented out

Error on filebeat

2017-06-21T16:06:19-04:00 INFO Metrics logging every 30s
2017-06-21T16:06:19-04:00 INFO States Loaded from registrar: 50
2017-06-21T16:06:19-04:00 INFO Loading Prospectors: 1
2017-06-21T16:06:19-04:00 INFO Starting spooler: spool_size: 2048; idle_timeout: 5s
2017-06-21T16:06:19-04:00 INFO Prospector with previous states loaded: 35
2017-06-21T16:06:19-04:00 INFO Starting prospector of type: log; id: 2918137255160190488
2017-06-21T16:06:19-04:00 INFO Loading and starting Prospectors completed. Enabled prospectors: 1
2017-06-21T16:06:19-04:00 INFO Starting Registrar
2017-06-21T16:06:19-04:00 INFO Start sending events to output
2017-06-21T16:06:19-04:00 INFO Harvester started for file: /archives/logs/tomcat7-8090/download.log
2017-06-21T16:06:19-04:00 INFO Harvester started for file: /archives/logs/tomcat7-8080/download.log
2017-06-21T16:06:49-04:00 INFO Non-zero metrics in the last 30s: filebeat.harvester.open_files=2 filebeat.harvester.running=2 filebeat.harvester.started=2 libbeat.logstash.call_count.PublishEvents=1 libbeat.logstash.publish.write_bytes=629 libbeat.publisher.published_events=2011
2017-06-21T16:06:49-04:00 ERR Failed to publish events caused by: read tcp 10.140.76.11:42172->10.140.223.89:5044: i/o timeout
2017-06-21T16:06:49-04:00 INFO Error publishing events (retrying): read tcp 10.140.76.11:42172->10.140.223.89:5044: i/o timeout
2017-06-21T16:07:19-04:00 INFO Non-zero metrics in the last 30s: libbeat.logstash.call_count.PublishEvents=1 libbeat.logstash.publish.read_errors=1 libbeat.logstash.publish.write_bytes=640 libbeat.logstash.published_but_not_acked_events=2011
2017-06-21T16:07:19-04:00 ERR Failed to publish events caused by: read tcp 10.140.76.11:50068->10.140.223.90:5044: i/o timeout
2017-06-21T16:07:19-04:00 INFO Error publishing events (retrying): read tcp 10.140.76.11:50068->10.140.223.90:5044: i/o timeout
2017-06-21T16:07:49-04:00 INFO Non-zero metrics in the last 30s: libbeat.logstash.call_count.PublishEvents=1 libbeat.logstash.publish.read_errors=1 libbeat.logstash.publish.write_bytes=669 libbeat.logstash.published_but_not_acked_events=2011
2017-06-21T16:07:49-04:00 ERR Failed to publish events caused by: read tcp 10.140.76.11:50074->10.140.223.90:5044: i/o timeout
2017-06-21T16:07:49-04:00 INFO Error publishing events (retrying): read tcp 10.140.76.11:50074->10.140.223.90:5044: i/o timeout
2017-06-21T16:08:19-04:00 INFO Non-zero metrics in the last 30s: libbeat.logstash.call_count.PublishEvents=1 libbeat.logstash.publish.read_errors=1 libbeat.logstash.publish.write_bytes=653 libbeat.logstash.published_but_not_acked_events=2011
2017-06-21T16:08:19-04:00 ERR Failed to publish events caused by: read tcp 10.140.76.11:42174->10.140.223.89:5044: i/o timeout
2017-06-21T16:08:19-04:00 INFO Error publishing events (retrying): read tcp 10.140.76.11:42174->10.140.223.89:5044: i/o timeout
2017-06-21T16:08:49-04:00 INFO Non-zero metrics in the last 30s: libbeat.logstash.call_count.PublishEvents=1 libbeat.logstash.publish.read_errors=1 libbeat.logstash.publish.write_bytes=496 libbeat.logstash.published_but_not_acked_events=2011
2017-06-21T16:08:49-04:00 ERR Failed to publish events caused by: read tcp 10.140.76.11:42198->10.140.223.89:5044: i/o timeout
2017-06-21T16:08:49-04:00 INFO Error publishing events (retrying): read tcp 10.140.76.11:42198->10.140.223.89:5044: i/o timeout
2017-06-21T16:09:19-04:00 INFO Non-zero metrics in the last 30s: libbeat.logstash.call_count.PublishEvents=1 libbeat.logstash.publish.read_errors=1 libbeat.logstash.publish.write_bytes=501 libbeat.logstash.published_but_not_acked_events=2011
2017-06-21T16:09:19-04:00 ERR Failed to publish events caused by: read tcp 10.140.76.11:50112->10.140.223.90:5044: i/o timeout
2017-06-21T16:09:19-04:00 INFO Error publishing events (retrying): read tcp 10.140.76.11:50112->10.140.223.90:5044: i/o timeout
2017-06-21T16:09:49-04:00 INFO Non-zero metrics in the last 30s: libbeat.logstash.call_count.PublishEvents=1 libbeat.logstash.publish.read_errors=1 libbeat.logstash.publish.write_bytes=495 libbeat.logstash.published_but_not_acked_events=2011
2017-06-21T16:09:49-04:00 ERR Failed to publish events caused by: read tcp 10.140.76.11:50134->10.140.223.90:5044: i/o timeout
2017-06-21T16:09:49-04:00 INFO Error publishing events (retrying): read tcp 10.140.76.11:50134->10.140.223.90:5044: i/o timeout
2017-06-21T16:10:19-04:00 INFO Non-zero metrics in the last 30s: libbeat.logstash.call_count.PublishEvents=1 libbeat.logstash.publish.read_errors=1 libbeat.logstash.publish.write_bytes=504 libbeat.logstash.published_but_not_acked_events=2011
2017-06-21T16:10:19-04:00 ERR Failed to publish events caused by: read tcp 10.140.76.11:42252->10.140.223.89:5044: i/o timeout
2017-06-21T16:10:19-04:00 INFO Error publishing events (retrying): read tcp 10.140.76.11:42252->10.140.223.89:5044: i/o timeout
2017-06-21T16:10:49-04:00 INFO Non-zero metrics in the last 30s: libbeat.logstash.call_count.PublishEvents=1 libbeat.logstash.publish.read_errors=1 libbeat.logstash.publish.write_bytes=367 libbeat.logstash.published_but_not_acked_events=2011
2017-06-21T16:10:49-04:00 ERR Failed to publish events caused by: read tcp 10.140.76.11:42274->10.140.223.89:5044: i/o timeout
2017-06-21T16:10:49-04:00 INFO Error publishing events (retrying): read tcp 10.140.76.11:42274->10.140.223.89:5044: i/o timeout
2017-06-21T16:11:19-04:00 INFO Non-zero metrics in the last 30s: libbeat.logstash.call_count.PublishEvents=1 libbeat.logstash.publish.read_errors=1 libbeat.logstash.publish.write_bytes=379 libbeat.logstash.published_but_not_acked_events=2011
2017-06-21T16:11:19-04:00 ERR Failed to publish events caused by: read tcp 10.140.76.11:50192->10.140.223.90:5044: i/o timeout
2017-06-21T16:11:19-04:00 INFO Error publishing events (retrying): read tcp 10.140.76.11:50192->10.140.223.90:5044: i/o timeout
2017-06-21T16:11:49-04:00 INFO Non-zero metrics in the last 30s: libbeat.logstash.call_count.PublishEvents=1 libbeat.logstash.publish.read_errors=1 libbeat.logstash.publish.write_bytes=378 libbeat.logstash.published_but_not_acked_events=2011

which Logstash and filebeat version are you using?

Hello Steffens ,
I am using version
filebeat version **5.4.**0 (amd64), libbeat 5.4.0
logstash version 5.4

Modifying grok has significantly reduced my CPU load on logstash servers but still at near 40%

A slow/stale logstash should not result into an i/o timeout error. As Logstash should send a heartbeat signal every 5 seconds if a batch of events is in progress.

Where are filebeat and logstash running and how are they connected? Any firewalls, NAT, other network equipment in the middle, potentially closing/dropping connections?

As workaround, increase the timeout setting in the filebeat logstash output (defaults to 60 seconds). And see how it goes. I wonder if it's due to LS not sending the heartbeat, or network equipment.

Thanks Steffens. Looks like my long grok pattern was causing the issue . Modified it and looks like errrors were gone after that

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.