I followed the migration guide to migrate my log inputs to filestream inputs, including adding a unique id and setting the "take_over" option to true. However, upon restarting the filebeat service, all of the logs are reharvested, resulting in a huge spike of millions of messages that brings my Graylog server to its knees.
I'd like to switch to filestreams since log inputs are deprecated and slated for removal, but if I can't migrate smoothly, I will not be able to roll it out to my production servers where the spike in reharvested logs would be 100x bigger.
Question are you running this in the same instance that already loaded the data from the logs input.... just checking
This method relies on reusing the registry data that tracks progress... if you just run this in a new filebeat instance it will try to load all the logs / files
I'm running filebeat 7.17.0. I was running 7.14.0 prior to the restart where I switched from log inputs to filestream inputs, in case that makes a difference.
Yes, I'm running it in the same instance that was previously using log inputs. I updated the input configs, updated the filebeat binaries from 7.14.0 to 7.17.0, and restarted the filebeat service. I then saw the flood of duplicate log messages in Graylog.
Filebeat does not provide access to the state information of different inputs. Hence, the filestream input cannot access the state information of a log input in the Filebeat registry. You must exclude the files the log input has processed or is processing. If you do not exclude those files, you will end up with duplicate events in the output.
I noticed in the upgrade docs that it's recommended to upgrade to 7.17 first before upgrading to 8.x. Considering that, it sounds like I will have to do the following:
switch back to log inputs
upgrade to 7.17.0
upgrade to 8.7.0
migrate to filestream inputs using take_over option
Does that look correct to you?
We do rotate some of our log files, but not all. And even for the ones we rotate, we let them grow pretty large before rotating, so we would want a smooth transition that avoids sending lots of duplicate messages.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.