Filebeat incorrectly tailing an XML log file

gtvmark · March 8, 2017, 11:51pm

I have the worst logfile to try and parse. It is a complete XML document with a trailing closing tag. Each time the logfile is written to, new entries are being inserted before the closing tag. This is playing havoc with the offset tailing feature of filebeat, it is cutting off the start of the new xml log entries and not matching the multiline and logstash is not able to parse the xml.

Is there anything that can tell filebeat to re-read the last X bytes of a file or some other way to handle XML document based logfiles?

this is the basic structure of the file:

<logfile>
<logentry>
...
</logentry>
<logentry>
...
</logentry>
</logfile>

ruflin · March 11, 2017, 4:39pm

Interesting use case. TBH that is the first time I see something like this. What kind of application / tool is writing these log files? Is the whole file rewritten every time?

I currently don't see any solution on how we could deal with this in Filebeat TBH.

gtvmark · March 13, 2017, 10:12pm

It's one of our studio automation applications. I don't know if the whole file or just the tail is overwritten.
The software isn't very robust, so finding out which part of it is malfunctioning has been a goal of us deploying ELK. You can tell that the software developers have not come from a server sysadmin background because the log files are a total PITA to parse, with xml sometimes embedded in CDATA in xml and it's using odd file naming conventions and logfile rotation.

Fortunately for me the closing document tag is short enough that it only cuts off the opening log entry tag. so I can recreate the opening tag with a bit of ruby code based on it's closing tag.

I'm guessing that a custom XML beat monitor would have to be written that keeps track of the number of entries at a particular xpath and only sends the differences.

gtvmark · March 13, 2017, 10:22pm

Not sure how hard this would be to implement in file beat, I think all it needs is a footer size setting, which tells it to rewind this number of bytes when tailing the logfile, skipping sending these bytes to Logstash could be optional.

just a thought.

ruflin · March 15, 2017, 10:48pm

The above could potentially be done but not sure how tricky it is. Lots of things in filebeat happen based on the assumption that lines are added. So you would also have to deal with different offset storage etc. As you will have quite a challenge with the above format also with other tools, I would advice you to change the format, if that is in any way possible. Based on the file structure, I assume the complete file is written every time, otherwise the code would be rather "interesting". That means you already have quite an overhead writing the file every time.

system · April 12, 2017, 10:48pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Urgent help needed : Filebeat xml question Beats filebeat	2	746	May 18, 2017
Filebeat - multiline: Ingest XML's without line feed at end of file Beats filebeat	7	3359	October 16, 2017
XML log file with multiple opening/closing tags Beats filebeat	2	262	July 23, 2019
Filebeat send snippet of xml Beats filebeat	7	253	May 29, 2024
Filebeat behavior Beats filebeat	7	1837	July 31, 2016

Filebeat incorrectly tailing an XML log file

Related topics