Metricbeat ERR Failed to publish events caused by: read tcp 127.0.0.1:56820->127.0.0.1:5044: i/o timeout


(Rich Horace) #1

Getting the errors below from metricbeat to logstash.

Versions:
logstash 5.4.0
metricbeat version 5.4.0 (amd64), libbeat 5.4.0

Metricbeat Log:

2017-05-11T00:57:41Z ERR Failed to publish events caused by: read tcp 127.0.0.1:43188->127.0.0.1:5044: i/o timeout
2017-05-11T00:58:33Z ERR Failed to publish events caused by: read tcp 127.0.0.1:46240->127.0.0.1:5044: i/o timeout
2017-05-11T00:59:08Z ERR Failed to publish events caused by: read tcp 127.0.0.1:49648->127.0.0.1:5044: i/o timeout
2017-05-11T01:00:00Z ERR Failed to publish events caused by: read tcp 127.0.0.1:51550->127.0.0.1:5044: i/o timeout
2017-05-11T01:00:53Z ERR Failed to publish events caused by: read tcp 127.0.0.1:54062->127.0.0.1:5044: i/o timeout
2017-05-11T01:01:28Z ERR Failed to publish events caused by: read tcp 127.0.0.1:56820->127.0.0.1:5044: i/o timeout

Netstat details:

netstat -an | grep 5044
tcp        0      0 0.0.0.0:5044                0.0.0.0:*                   LISTEN      
tcp        0      0 127.0.0.1:46818             127.0.0.1:5044              ESTABLISHED 
tcp        0      0 127.0.0.1:5044              127.0.0.1:46818             ESTABLISHED 
tcp        0      0 172.19.13.145:45044         34.202.71.250:3000          ESTABLISHED 
tcp        0      0 172.19.13.145:3000          52.52.206.128:50442         ESTABLISHED 
tcp        0      0 172.19.13.145:50444         54.67.1.143:3000            ESTABLISHED

Metricbeat Config:
metricbeat.modules:

- module: system
  metricsets:
    - cpu
    - load
    - core
    - diskio
    - filesystem
    - fsstat
    - memory
    - network
    - process
  enabled: true
  period: 10s
  processes: ['.*']

name: prod_aerospike-v3_10_0_3mm-euwest1-01b

output.logstash:
  hosts: ["localhost:5044"]

Logstash Input:
input { beats { port => 5044 tags => ["metricbeat"] } }

Thanks,
Rich


(Carlos Pérez Aradros) #2

Hi @ritchierich,

Could you please check logstash logs? Please dump here anything you find

Best regards


(Steffen Siering) #3

More complete logs with debugging logs enabled -d '*' will help us seeing when metricbeat did start publishing the events and when the i/o timeout was triggered.

Is Logstash stuck? The error happens when metricbeat is waiting for an ACK or keep-alive signal from Logstash. Normally logstash will send a keep-alive signal every few seconds, resetting the timer in metricbeat. Maybe you can get a pcap with tcpdump so we can see if communication takes place properly.

What happens if you increase the timeout from default 30s to e.g. 1h. Is Logstash processing events? Can metricbeat send more events?


(Rich Horace) #4

@exekias @steffens

Here's some more details about my Elastic Stack configuration. I haven't had the opportunity to put logging into debug mode or change the timeout. Did try adding Djava.net.preferIPv4Stack=true in jvm.options but no luck. Below is logstash-json.log from the same server.

Elastic Stack configuration:

  • Same configuration running in multiple AWS regions

  • Logstash - All regions are processing application logs

  • Metricbeat - Only one regions is expiring metricbeat timeouts

  • ELK classic pipeline: app Logs/metricbeat => logstash => redis

      {"level":"WARN","loggerName":"logstash.agent","timeMillis":1494465032949,"thread":"LogStash::Runner","logEvent":{"message":"stopping pipeline","id":"main"}}
      {"level":"INFO","loggerName":"logstash.pipeline","timeMillis":1494465047832,"thread":"[main]-pipeline-manager","logEvent":{"message":"Starting pipeline","id":"main","pipeline.workers":16,"pipeline.batch.size":125,"pipeline.batch.delay":5,"pipeline.max_inflight":2000}}
      {"level":"INFO","loggerName":"logstash.inputs.beats","timeMillis":1494465048597,"thread":"[main]-pipeline-manager","logEvent":{"message":"Beats inputs: Starting input listener","address":"0.0.0.0:5044"}}
      {"level":"INFO","loggerName":"logstash.pipeline","timeMillis":1494465048631,"thread":"[main]-pipeline-manager","logEvent":{"message":"Pipeline main started"}}
      {"level":"INFO","loggerName":"logstash.agent","timeMillis":1494465048678,"thread":"Api Webserver","logEvent":{"message":"Successfully started Logstash API endpoint","port":9600}}

(Steffen Siering) #5

You can configure metricbeat to push to redis directly. Why do you need to run metricbeat and logstash on same host?

The logstash logs only do contain startup information. Have you got some debug logs as well?

Have you collected any debug logs from metricbeat?

In case you want/need to logstash, can you try logstash with null or stdout output only (no redis output) and see it's processing any data?


(Rich Horace) #6

@steffens @exekias

Thanks for the help and suggestions! The metricbeat i/o timeouts threw me off, I'm in the last phase of migrating from ES 1.7.1 to Elastic Stack 5. Part of my strategy is to do apples to apples which is why I'm doing metricbeats to logstash.

But after diving into it further the following changes solved the problem.

  • Metricbeat: Reduced metricbeats period from 10 to 60
  • Logstash Shipper: Redis output added batch and batch_events
  • Logstash Indexer: Elasticsearch output added flush_size

Logstash Shipper:
output { redis { host => {{ elastic_stack.shipper_redis.hosts }} shuffle_hosts => true data_type => "list" batch => "true" batch_events => "250" key => "logstash" } }

Logstash Indexer:
output { elasticsearch { flush_size => 4000 hosts => {{ elastic_stack.indexer_redis.hosts }} } }

Cheers,
Rich


(system) #7

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.