We lost some metrics from time to time blocked by logstash

Robin_Guo · September 14, 2018, 4:51am

Dear Elastic ，

We lost some metrics from time to time recently.
Could someone give some suggestion to solve this problem?

Error from Metricbeat

2018-09-14T11:03:37+08:00 ERR Failed to publish events caused by: EOF
2018-09-14T11:17:37+08:00 ERR Failed to publish events caused by: EOF
2018-09-14T11:20:37+08:00 ERR Failed to publish events caused by: EOF

Error from Cisco ISR

10.0.0.1 is our HAproxy server, our logstash cluster is behind the HAproxy server.

Sep 13 2018 12:26:35: %SYS-3-LOGGINGHOST_FAIL: Logging to host 10.0.0.1 port 5050 failed

Error from Cisco ASA

Sep 13 2018 12:27:29 testfw01 : %ASA-3-414003: TCP Syslog Server NETW:10.0.0.1/5049 not responding, New connections are permitted based on logging permit-hostdown policy

configure from Haproxy

vim /etc/haproxy/haproxy.cfg
defaults
    log                     global
    mode                    tcp
    option                  dontlognull
    option                  redispatch
    retries                 3
    #timeout http-request    60s
    timeout queue           300s
    timeout connect         300s
    timeout client          300s
    timeout server          300s
    #timeout http-keep-alive 60s
    timeout check           60s
    maxconn                 50000



#---------------------------------------------------------------------
# main frontend which proxys to the backends
#---------------------------------------------------------------------
frontend  metricbeat
     bind *:5044
     mode tcp
     timeout client 300s
     default_backend     metricbeat2ES
     maxconn 2000
#---------------------------------------------------------------------
# round robin balancing between the various backends
#---------------------------------------------------------------------
backend metricbeat2ES
    mode     tcp
    timeout server 300s
    balance  roundrobin
    server  robinlogstash01 10.0.0.1:5044    check  maxconn 2000
    server  robinlogstash02 10.0.0.2:5044    check  maxconn 2000
    server  robinlogstash03 10.0.0.3:5044    check  maxconn 2000
    server  robinlogstash04 10.0.0.4:5044    check  maxconn 2000

logstash pipeline for metricbeat

vim /etc/logstash/conf.d/metricbeat.conf

#logstash for pipeline metricbeat

input {
  beats {
    port => 5044
    client_inactivity_timeout => 300
    #ssl => false
    #ssl_verify_mode => "none"
   # codec => json {
   #  charset => "UTF-8"
   #}
  }
}


filter {

  if "_jsonparsefailure" in [tags] {
        drop { }
  }


}


output {
  file => "/tmp/metricbeat.log"

}

yaauie · September 14, 2018, 6:21pm

The EOF means End-Of-File, meaning that whatever Metricbeat was connected to hung up on it, and it was unable to send the metric.

The Beats protocol uses long-lived TCP connections instead of establishing a new connection each time there is an event to be sent. Interrupting this connection could be the cause of the above issues.

I am not an HAProxy expert, but the configuration posted looks like it is attempting to limit the lifetime of the connections to 300s (with timeout server and timeout client directives), which would be a probable cause for early termination of connections. I would advise setting the timeout connect to something like 30s, and setting the timeout server and timeout client values much, much higher.

system · October 12, 2018, 6:21pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.