FIlebeat 5.0.1 crashing

Hi,

I have a set up of filebeat ->logstash -> elasticsearch on 16 core RHEL

I recently upgraded my filebeat from 1.0.0 to 5.0.1 and logstash from 2.1.1 to 2.4.0 2 days back. Filebeat suddenly gets stopped twice in these days (both at around 5:00 AM). I observed below error in filebeat :

err failed to publish events caused by: EOF

It never happened with earlier version (1.0.0) which is running from past 1 year. Please help if I am doing something wrong here as I am new to ELK stack.

Below are my configurations for both versions :

configurations (only uncommented part here) in previous filebeat version :

################### Filebeat Configuration Example #########################

############################# Filebeat ######################################
filebeat:
  # List of prospectors to fetch data.
  prospectors:
        paths:
        - /usr/local/apps/log/log.*
      input_type: log

  registry_file: /var/lib/filebeat/registry

 
output:

  ### Logstash as output
  logstash:
    # The Logstash hosts
    hosts: ["10.0.0.274:5044"]

    # Number of workers per Logstash host.
    worker: 4

   
logging:

  files:
    
    rotateeverybytes: 10485760 # = 10MB

configurations in updated filebeat version (only uncommented part here) :

################### Filebeat Configuration Example #########################

############################# Filebeat ######################################
#filebeat:
  # List of prospectors to fetch data.
filebeat.prospectors:
   
      paths:
        - /usr/local/apps/log/log.*
      
      input_type: log

      
      ignore_older: 24h

output.logstash:
    # The Logstash hosts
    hosts: ["10.0.0.274:5044"]

    # Number of workers per Logstash host. default 1
    worker: 4

logging.level: error
logging.to_files: false
logging.to_syslog: false

Logstash.conf (for both versions) :

input {
file{
path => "/usr/local/apps/monitoring/monitor.log"
add_field => {"iname" => "monitor"}
 codec => "json"
}
 beats {
    port => 5044
    codec => "json"
   congestion_threshold =>30
   add_field => {"iname" => "bee"}
  }
beats{
        port => 5045
        codec => "json"
        add_field => {"iname" => "mis"}
        }
}
filter {
 mutate {
    	remove_field => [ "fields","input_type" , "offset","host","beat","n","h","p","v","l" ]
	}
  date {
    match => [ "t", "ISO8601","UNIX_MS" ]
  }
  if [iname] == "mis" {
	geoip {
        	source=> ip
        	}
  }
}
output {
if [iname] == "bee" {
  elasticsearch { hosts => ["10.0.0.278:9200"]
 		document_id => "%{[j][cid]}-%{[j][msgid]}"
                index => "logstash-%{[iname]}-%{+YYYY.MM.dd}"
              	workers => 8
  }
} else {
elasticsearch { hosts => ["10.0.0.278:9200"]
                index => "logstash-%{[iname]}-%{+YYYY.MM.dd}"
              	workers => 8

                }
}

}

How can you tell filebeat is crashing? All I see is the 'EOF' log message. EOF (end of file) means the remote host did close the network connection. In this case filebeat will reconnect and continue sending.

Which logstash-input-beats version have you installed? It might be the default plugin version in LS 2.4 having a bug closing the connection from time to time.

I can see that filebeat process is not being shown up in ps -ef|grep filebeat. Uploading screenshots of the time when filebeat got crashed.

Currently my architecture is as follows :

I have 2 filebeat processes - filebeat1 and filebeat2 (both on x.x.x.236) and 2 logstash components (logsatsh1 on x.x.x.237 and logstash2 on x.x.x.236). Fileebeat1 is making connection to Logstash1 on port 5044 and filebeat2 is making connection to Logstash2 on port 5045.

Both logsatsh components have same logstash.conf

configurations :

input {
file{
path => "/usr/local/apps/monitoring/monitor.log"
add_field => {"iname" => "monitor"}
 codec => "json"
}
 beats {
    port => 5044
    codec => "json"
   congestion_threshold =>30
   add_field => {"iname" => "bee"}
  }
beats{
        port => 5045
        codec => "json"
        add_field => {"iname" => "mis"}
        }
}
filter {
 mutate {
        remove_field => [ "fields","input_type" , "offset","host","beat","n","h","p","v","l" ]
        }
  date {
    match => [ "t", "ISO8601","UNIX_MS" ]
  }
  if [iname] == "mis" {
        geoip {
                source=> ip
                }
  }
}
output {
if [iname] == "bee" {
  elasticsearch { hosts => ["x.x.x.95:9200"]
                document_id => "%{[j][cid]}-%{[j][msgid]}"
                index => "logstash-%{[iname]}-%{+YYYY.MM.dd}"
                workers => 8
  }
} else {
elasticsearch { hosts => ["x.x.x.95:9200"]
                index => "logstash-%{[iname]}-%{+YYYY.MM.dd}"
                workers => 8

                }
}

}

Also, I observed (in uploaded screenshots of server) low amount of free memory and high buffer cache. Can it be a reason for filebeat crashing ??.

Unfortunately logging was not enabled at server. Will enable logging on server and share the results if it gets crashed again

Please properly format you posts (e.g. use </> button) and don't use screen-shots, it's very hard to read your posts otherwise.

I hope you have the two filebeat instances configured with different registry files.

Have you checked system logs for reason filebeat was stopped? If kernel stops process due to OOM and some segfault, it's normally in the logs.

No other filebeat logs, stack trace log or core dump?

Have you ever checked filebeat memory usage?

What's start-beat.sh doing? Why not use the init/systemd scripts shipped with filebeat? Don't know which init system your RHEL version is using. But systemd can be configured to restart the service once it becomes unavailable and the init scripts use a filebeat-god script, restarting filebeat once it is stopped?

Hi Steffen,

Thanks for your suggestions.

Unfortunately logging was not enabled when filebeat crashed and nothing was observed on system logs too. Now I have enabled logging at filebeat and will check when it carshes again. No stack trace was seen at the time of crashing.
As per logs, average CPU utlization by filebeat is 80% and memory usage percenatge is very low (0.2%) normally. Not sure about the exact data at the time of crashing.
start-beat.sh was used to launch filebeat process. Will plan to implement and use init scripts for sure.
Will come with more details if filebeat carshes again.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.