Hi All,
Apologies if this is a dumb and/or basic question. I had a search through the forums but didn't find anything that quite gave me the help I was after.
I need to use a grok filter to parse events with XML payloads. The problem I have is that many of the XML elements are optional so can be present in some events and entirely missing in others.
To give you a made up example, I'd like to capture the values of XML elements "a" and "c" in the sample events below, where element "c" is optional.
HEADER TEXT<a>data.1</a><b>data.2</b><c>data.3</c><d>data.4</d><e>data.5</e>
HEADER TEXT<a>data.1</a><b>data.2</b><d>data.4</d><e>data.5</e>
Note that in the second event there is no "c" element, so I wouldn't expect there to be a field captured for it in the logstash output.
I've been fiddling around with the pattern but haven't managed to get it right - it either captures the content of "c" correctly when it's present, but has a 'no matches' when its absent or it doesn't capture it even when its present. This has been done using the online Heroku Grok Debugger. This is what I've got so far (which doesn't work )
%{GREEDYDATA}<a>%{DATA:a}<%{GREEDYDATA}(?:<c>%{DATA:c})<%{GREEDYDATA}
The above matches when the "c" element is present but doesn't handle the situation when its absent.
As I say, apologies in advance as this is probably due to my inexperience with regular expressions, but I'm in danger of running out of '?' characters, I've sprayed so many in different places into the pattern in an attempt to get it working...
If anyone can help out with a pattern that would work against the above that would be much appreciated.
cheers,
Steve