I have a set up of filebeat ->logstash -> elasticsearch on 16 core RHEL
I recently upgraded my filebeat from 1.0.0 to 5.0.1 and logstash from 2.1.1 to 2.4.0 2 days back. Filebeat suddenly gets stopped twice in these days (both at around 5:00 AM). I observed below error in filebeat :
err failed to publish events caused by: EOF
It never happened with earlier version (1.0.0) which is running from past 1 year. Please help if I am doing something wrong here as I am new to ELK stack.
Below are my configurations for both versions :
configurations (only uncommented part here) in previous filebeat version :
################### Filebeat Configuration Example #########################
############################# Filebeat ######################################
filebeat:
# List of prospectors to fetch data.
prospectors:
paths:
- /usr/local/apps/log/log.*
input_type: log
registry_file: /var/lib/filebeat/registry
output:
### Logstash as output
logstash:
# The Logstash hosts
hosts: ["10.0.0.274:5044"]
# Number of workers per Logstash host.
worker: 4
logging:
files:
rotateeverybytes: 10485760 # = 10MB
configurations in updated filebeat version (only uncommented part here) :
################### Filebeat Configuration Example #########################
############################# Filebeat ######################################
#filebeat:
# List of prospectors to fetch data.
filebeat.prospectors:
paths:
- /usr/local/apps/log/log.*
input_type: log
ignore_older: 24h
output.logstash:
# The Logstash hosts
hosts: ["10.0.0.274:5044"]
# Number of workers per Logstash host. default 1
worker: 4
logging.level: error
logging.to_files: false
logging.to_syslog: false
How can you tell filebeat is crashing? All I see is the 'EOF' log message. EOF (end of file) means the remote host did close the network connection. In this case filebeat will reconnect and continue sending.
Which logstash-input-beats version have you installed? It might be the default plugin version in LS 2.4 having a bug closing the connection from time to time.
I can see that filebeat process is not being shown up in ps -ef|grep filebeat. Uploading screenshots of the time when filebeat got crashed.
Currently my architecture is as follows :
I have 2 filebeat processes - filebeat1 and filebeat2 (both on x.x.x.236) and 2 logstash components (logsatsh1 on x.x.x.237 and logstash2 on x.x.x.236). Fileebeat1 is making connection to Logstash1 on port 5044 and filebeat2 is making connection to Logstash2 on port 5045.
Please properly format you posts (e.g. use </> button) and don't use screen-shots, it's very hard to read your posts otherwise.
I hope you have the two filebeat instances configured with different registry files.
Have you checked system logs for reason filebeat was stopped? If kernel stops process due to OOM and some segfault, it's normally in the logs.
No other filebeat logs, stack trace log or core dump?
Have you ever checked filebeat memory usage?
What's start-beat.sh doing? Why not use the init/systemd scripts shipped with filebeat? Don't know which init system your RHEL version is using. But systemd can be configured to restart the service once it becomes unavailable and the init scripts use a filebeat-god script, restarting filebeat once it is stopped?
Unfortunately logging was not enabled when filebeat crashed and nothing was observed on system logs too. Now I have enabled logging at filebeat and will check when it carshes again. No stack trace was seen at the time of crashing.
As per logs, average CPU utlization by filebeat is 80% and memory usage percenatge is very low (0.2%) normally. Not sure about the exact data at the time of crashing. start-beat.sh was used to launch filebeat process. Will plan to implement and use init scripts for sure.
Will come with more details if filebeat carshes again.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.