Filebeat: send fail; Error publishing events (retrying): read tcp 10.13.37.73:49990->10.13.37.99:5044: i/o timeout

Hello! I am new to ELKStack. Somebody else have installed Logstash, Elasticsearch on the logstash server and filebeat on another server. But there is a problem remain and I responsible of solving it.

The problem seems like the connection between filebeat and logstash server is not success. When I run 'systemctl status filebeat -l'. I got below:

[ruoyu@compute-20 filebeat]$ systemctl status filebeat -l
● filebeat.service - filebeat
Loaded: loaded (/usr/lib/systemd/system/filebeat.service; enabled; vendor preset: disabled)
Active: active (running) since Sun 2016-07-17 14:45:06 EDT; 19h ago
Docs: https://www.elastic.co/guide/en/beats/filebeat/current/index.html
Main PID: 18016 (filebeat)
CGroup: /system.slice/filebeat.service
└─18016 /usr/bin/filebeat -c /etc/filebeat/filebeat.yml -v

Jul 18 09:56:28 compute-20.moc.ne.edu /usr/bin/filebeat[18016]: single.go:159: backoff retry: 1m0s
Jul 18 09:58:58 compute-20.moc.ne.edu /usr/bin/filebeat[18016]: single.go:76: Error publishing events (retrying): read tcp 10.13.37.73:54950->10.13.37.99:5044: i/o timeout
Jul 18 09:58:58 compute-20.moc.ne.edu /usr/bin/filebeat[18016]: single.go:152: send fail
Jul 18 09:58:58 compute-20.moc.ne.edu /usr/bin/filebeat[18016]: single.go:159: backoff retry: 1m0s
Jul 18 10:00:08 compute-20.moc.ne.edu /usr/bin/filebeat[18016]: single.go:76: Error publishing events (retrying): EOF
Jul 18 10:00:08 compute-20.moc.ne.edu /usr/bin/filebeat[18016]: single.go:152: send fail
Jul 18 10:00:08 compute-20.moc.ne.edu /usr/bin/filebeat[18016]: single.go:159: backoff retry: 1m0s
Jul 18 10:02:38 compute-20.moc.ne.edu /usr/bin/filebeat[18016]: single.go:76: Error publishing events (retrying): read tcp 10.13.37.73:55110->10.13.37.99:5044: i/o timeout
Jul 18 10:02:38 compute-20.moc.ne.edu /usr/bin/filebeat[18016]: single.go:152: send fail
Jul 18 10:02:38 compute-20.moc.ne.edu /usr/bin/filebeat[18016]: single.go:159: backoff retry: 1m0s

But I use 'telnet 10.13.37.99 5044' to test the connection and the result is alright.
[ruoyu@compute-20 filebeat]$ telnet 10.13.37.99 5044
Trying 10.13.37.99...
Connected to 10.13.37.99.
Escape character is '^]'.

Below is the filebeat.yml file on the other server (not logstash server).
filebeat:

  1   prospectors:
  2     -
  3       paths:
  4           # - /var/log/*.log
  5         - /var/log/secure
  6         - /var/log/messages
  7           # - /var/log/ceph/*
  8           # - /var/log/nova/*
  9           # - /var/log/neutron/*
 10           # - /var/log/openvswitch/*
 11           # - /var/log/cinder/*
 12           # - /var/log/glance/*
 13           # - /var/log/horizon/*
 14           # - /var/log/httpd/*
 15           # - /var/log/keystone/*
 16 
 17       encoding: plain
 18       fields_under_root: false
 19       input_type: log
 20       ignore_older: 24h
 21       document_type: syslog
 22       scan_frequency: 10s
 23       harvester_buffer_size: 16384
 24       tail_files: false
 25       force_close_files: false
 26       backoff: 1s
 27       max_backoff: 10s
 28       backoff_factor: 2
 29       partial_line_waiting: 5s
 30       max_bytes: 10485760
 31       spool_size: 1024
 32       idle_timeout: "15s"
 33   registry_file: /var/lib/filebeat/registry
 34 output:
 35   logstash:
 36     hosts: ["10.13.37.99:5044"]
 37   shipper: {}
 38   logging:
 39     level: info
 40   runoptions: {}

Below are the configurations of logstash:

input {
  beats {
    port => 5044
  }
}

output {
  elasticsearch {
    hosts => ["10.13.37.99:9200"]
    sniffing => true
    manage_template => false
    index => "%{[@metadata][beat]}-%{+YYYY.MM.dd}"
    document_type => "%{[@metadata][type]}"
  }
}

Any help is greatly appreciated. Thank you very much!

Which filebeat and logstash version are you using?

Hi ruflin,

The filebeat version is: filebeat version 1.2.3 (amd64)
The logstash version is: logstash 2.2.4

Thanks for asking!

Best,
Ruoyu Chen

How about the connection between LS and ES? Do you use the most recent beats-input version in LS?

default io timeout is 30 seconds. Normally logstash circuit breaker will close the connection after 5 seconds. Anything in logstash logs?

Hi Ruflin,

Thanks for reply! Below is the log information of logstash. It saids 'the
pipeline is blocked.' and 'No space left on device'. We think it maybe
because that the filebeats had collected so many log files to the logstash
server that blocked the pipe line?

We have one logstash server running logstash and elasticsearch. We also
have many other compute nodes that filebeat will collect log files on.

{:timestamp=>"2016-07-22T17:20:22.610000-0400",
:message=>"CircuitBreaker::rescuing exceptions", :name=>"Beats input",
:exception=>LogStash::Inputs::Beats::InsertingToQueueTakeTooLong,
:level=>:warn}
{:timestamp=>"2016-07-22T17:20:22.622000-0400", :message=>"Beats input: The
circuit breaker has detected a slowdown or stall in the pipeline, the input
is closing the current connection and rejecting new connection until the
pipeline recover.",
:exception=>LogStash::Inputs::BeatsSupport::CircuitBreaker::HalfOpenBreaker,
:level=>:warn}
{:timestamp=>"2016-07-22T17:20:22.622000-0400", :message=>"Beats input: The
circuit breaker has detected a slowdown or stall in the pipeline, the input
is closing the current connection and rejecting new connection until the
pipeline recover.",
:exception=>LogStash::Inputs::BeatsSupport::CircuitBreaker::HalfOpenBreaker,
:level=>:warn}
{:timestamp=>"2016-07-22T17:20:23.070000-0400", :message=>"Beats input: the
pipeline is blocked, temporary refusing new connection.",
:reconnect_backoff_sleep=>0.5, :level=>:warn}
{:timestamp=>"2016-07-22T17:20:23.072000-0400",
:message=>"CircuitBreaker::Open", :name=>"Beats input", :level=>:warn}
{:timestamp=>"2016-07-22T17:20:23.073000-0400", :message=>"Beats input: The
circuit breaker has detected a slowdown or stall in the pipeline, the input
is closing the current connection and rejecting new connection until the
pipeline recover.",
:exception=>LogStash::Inputs::BeatsSupport::CircuitBreaker::OpenBreaker,
:level=>:warn}
{:timestamp=>"2016-07-22T17:20:23.573000-0400", :message=>"Beats input: the
pipeline is blocked, temporary refusing new connection.",
:reconnect_backoff_sleep=>0.5, :level=>:warn}

=>"node-32.moc.ne.edu", "tags"=>["beats_input_codec_plain_applied"]},
@metadata_accessors=#<LogStash::Util::Accessors:0x158c67ae
@store={"type"=>"log", "beat
"=>"filebeat"}, @lut={"[type]"=>[{"type"=>"log", "beat"=>"filebeat"},
"type"], "[beat]"=>[{"type"=>"log", "beat"=>"filebeat"}, "beat"]}>,
@cancelled=false>],
 :response=>{"create"=>{"_index"=>"filebeat-2016.07.23", "_type"=>"log",
"_id"=>"AVYUet5p6ls66B28P6c0", "status"=>500,
"error"=>{"type"=>"exception", "reason
"=>"failed to sync translog", "caused_by"=>{"type"=>"i_o_exception",
"reason"=>"No space left on device"}}}}, :level=>:warn}

This topic was automatically closed after 21 days. New replies are no longer allowed.