We have 40 instances running filebeat and centralized logstash.
Our application rotates the log files based on size and deleted after 5 files. So during high volume peak hours. Logs gets queued in our servers and the queue gets cleared in non-peak hours. Example count of one server below:
Running filebeat as a services in servers. with multiple paths in it. Grok and multiine is currently handled in logstash. Now we have plans to move the multiline to filebeat.
First I strongly recommend to update all your components to the 5.x releases. Also I recommend you to have a look at https://github.com/logstash-plugins/logstash-input-beats/issues/201 for the multiline on the LS side and the potential issues with it. Glad to hear you are moving it.
The only solution for the above I could see is that you add more rotated files to your log algorithm, so files are not only not deleted because they are still kept open by filebeat but because the OS wants to keep them still around.
One potential solution for the above could be to add Kafka or Redis to the equation to queue events on top load.
Thanks Rufin.. Upgrading might need some rounds of testing i believe.
Are you recommending only filebeat to be upgraded to stop this queuing these number files. In this current scenario i don't have an option of restarting filebeat, if i do all this lsof files would be removed from instance. Do we have any hack to push these queued files to logstash ?
I would recommend to filebeat 5.x, you could keep the rest of the stack, but make sure in LS to update tot he most recent version of the beats plugin. Obviously in the best case, you can update the complete stack. But as you said: That needs testing first to make sure all keeps working.
Are you saying that filebeat never fully catches up with the files on disk? Because then you definitively have an issue. Or is it only during peak periods?
It catches up in non-peak hours.. but before it catches up fully. The next peak hours begins. So we had added new logstash server for 4 servers where logs gets generated in high volume and moved the multiline plugin to filebeat(still not moved to 5.x)
Can i decrease the scan frequency for only this log_type? will it stop this queuing? or any other suggestion to stop this queuing completely? Since it is queued the logs are not real time. It comes after 2 hours..
There is a known issue for open file handlers in the 1.x releases which can happen under heavy load because of race conditions. This is one of the reasons the event handling was rewritten for the 5.x releases. As Filebeat should also work with the older versions of Logstash, would it be an option for you to just upgrade filebeat?
I don't see how scan_frequency decrease would solve this issue, but it can be configured per prospector.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.