Grok Pattern for optional XML tags

Hi All,

Apologies if this is a dumb and/or basic question. I had a search through the forums but didn't find anything that quite gave me the help I was after.

I need to use a grok filter to parse events with XML payloads. The problem I have is that many of the XML elements are optional so can be present in some events and entirely missing in others.

To give you a made up example, I'd like to capture the values of XML elements "a" and "c" in the sample events below, where element "c" is optional.

HEADER TEXT<a>data.1</a><b>data.2</b><c>data.3</c><d>data.4</d><e>data.5</e>
HEADER TEXT<a>data.1</a><b>data.2</b><d>data.4</d><e>data.5</e>

Note that in the second event there is no "c" element, so I wouldn't expect there to be a field captured for it in the logstash output.

I've been fiddling around with the pattern but haven't managed to get it right - it either captures the content of "c" correctly when it's present, but has a 'no matches' when its absent or it doesn't capture it even when its present. This has been done using the online Heroku Grok Debugger. This is what I've got so far (which doesn't work :frowning:)

%{GREEDYDATA}<a>%{DATA:a}<%{GREEDYDATA}(?:<c>%{DATA:c})<%{GREEDYDATA} 

The above matches when the "c" element is present but doesn't handle the situation when its absent.

As I say, apologies in advance as this is probably due to my inexperience with regular expressions, but I'm in danger of running out of '?' characters, I've sprayed so many in different places into the pattern in an attempt to get it working... :slight_smile:

If anyone can help out with a pattern that would work against the above that would be much appreciated.

cheers,
Steve

Why not use the xml filter to parse the XML?

Hi Magnus,

Thanks for the response. The short answer to your question is because I didn't know it existed :confused:. I'm relatively new to Logstash, but apologise as I probably should have spotted this...!

I'll check it out as it sounds like its built for what I need.

Best regards,
Steve