I have the worst logfile to try and parse. It is a complete XML document with a trailing closing tag. Each time the logfile is written to, new entries are being inserted before the closing tag. This is playing havoc with the offset tailing feature of filebeat, it is cutting off the start of the new xml log entries and not matching the multiline and logstash is not able to parse the xml.
Is there anything that can tell filebeat to re-read the last X bytes of a file or some other way to handle XML document based logfiles?
Interesting use case. TBH that is the first time I see something like this. What kind of application / tool is writing these log files? Is the whole file rewritten every time?
I currently don't see any solution on how we could deal with this in Filebeat TBH.
It's one of our studio automation applications. I don't know if the whole file or just the tail is overwritten.
The software isn't very robust, so finding out which part of it is malfunctioning has been a goal of us deploying ELK. You can tell that the software developers have not come from a server sysadmin background because the log files are a total PITA to parse, with xml sometimes embedded in CDATA in xml and it's using odd file naming conventions and logfile rotation.
Fortunately for me the closing document tag is short enough that it only cuts off the opening log entry tag. so I can recreate the opening tag with a bit of ruby code based on it's closing tag.
I'm guessing that a custom XML beat monitor would have to be written that keeps track of the number of entries at a particular xpath and only sends the differences.
Not sure how hard this would be to implement in file beat, I think all it needs is a footer size setting, which tells it to rewind this number of bytes when tailing the logfile, skipping sending these bytes to Logstash could be optional.
The above could potentially be done but not sure how tricky it is. Lots of things in filebeat happen based on the assumption that lines are added. So you would also have to deal with different offset storage etc. As you will have quite a challenge with the above format also with other tools, I would advice you to change the format, if that is in any way possible. Based on the file structure, I assume the complete file is written every time, otherwise the code would be rather "interesting". That means you already have quite an overhead writing the file every time.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.