I'm new to this discussion webpage but have been using the ELK stack for a fair amount of time now, and this page been a great help for me.
I was recently looking into the full config yml file for Filebeat and didn't see a specific config to shut down the Filebeat service itself, if it can't connect to ES or LS after few tries or if been trying for a long time.
The reason for my question is that we are trying to make certain beats a part of a platform load; and in some networks we may have no connection back to our cluster. Our concern is whether there will be any networking issues/flooding by constantly trying to connect to logstash…etc.
I just would like to know if there is already something to handle such thing or do I need to script something to take care of that. I know there isfilebeat.shutdown_timeout, but not sure if applies to my scenario.
I appreciate your response and thank you in advance.
there is no option in beats to shut down if the remote is not reachable. Instead beats do use exponential backoff on retry, starting with one second sleep, then 2, 4, 8, 16, 32 up to 60 seconds. Unfortunately these settings are not configurable yet.
When do you plan to restart filebeat? How do you detect Logstash becoming available?
We are mainly concerned that once filebeat becomes part of our os load and gets deployed somewhere where our ES/LS cluster is not reachable for any reason. So, in that case restarting flebeat wont be necessary unless we have a connection being available later, which in that case will restart filebeat manually.
Since logs will rotate and will only keep few files I believe filling up the logs at the info level wont be a problem, also if we increase the scanning period, which is 10s by default, if im not mistaken. Looks like we will need a process during the load install to ping our cluster and if not reachable to not start the filebeat service.
Are you aware of any network problems if filebeat keeps trying to reach LS or ES and it cant ?
so, how do you determine when LS/connection becomes available? You continuously ping (try to connect via TCP) and start filebeat if connection attempt was successful (using/generating up to 2 TCP connections)? That's what filebeat is already doing with intervals of up to 1 minute per connection attempt (we will make this behavior configurable). If filebeat gets no connection, all processing will be blocked in filebeat (well, prospector is still scanning for new files) and harvesters start reading new files, but will be blocked once filebeat spooler is full.
Are you aware of any network problems if filebeat keeps trying to reach LS or ES and it cant ?
What kind of network problems are you referring to? It's basically one TCP SYN package per beat, about every minute. Having a few thousand beats trying to connect at exactly the same time, will give you this many TCP SYN packages. If remote is not reachable at all, the connection attempt will time out. If remote is visible, but service is not running an ICMP package is returned, telling OS the port is not available.
Sorry for the late response as I had to work on a new big elastic project with my team.
I believe for now we have decided to just keep the beat service shutdown/stopped by default, and we will determine when we need it to start later on.
The intention was to have some sort of a cron job to monitor the connection from beat source (vm where beat is installed) and take actions based on the connection.
Thank you for your replies and the information provided.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.