Normalise and parse xml encoded tag element correctly

Hi All

I'm getting started with using elk stack, i need to configure logstash to be able to parse this logfile correctly, the sample data is something like this :

####<Mar 6, 2020 10:37:41 PM WIB> <> <> <1583509061697> <785592> <ID:<452274.1583509058348.0>> <> <jms_module!Queue_Name> <Action> <user> <Session_identifier> <<the encoded xml>>

now, using grok, i was able to parse the entire line correctly, using these pattern :

####%{PATTERN:sysdate} %{PATTERN:uniquie_id} %{PATTERN:thread_id} %{PATTERN:another_id} %{PATTERN:queue_name} %{PATTERN:action} %{PATTERN:user} %{PATTERN:session} %{PATTERN:xml}
PATTERN (<[^?]*>)

except for the <<the encoded xml>> parts (Grok Debugger responded with : Provided Grok patterns do not match data in the input)
because, unlike normal xml, which is supposed to be something like this :

<?xml version="1.0" encoding="UTF-8"?>

it is actually written like this in the log file:

&lt;?xml version="1.0" encoding="UTF-8"?&gt;

this encoding(?) applies exactly only for these character : < (opening angle bracket), > (closing angle bracket), and " (double quote)

which translate to these respectively :
< to &lt;
> to &gt;
" to &quot;

i've searched the documentation and discussion to find any clue on how to normalise these kinds of case, but got no luck.
what i need is simply get <<the encoded xml>> as a string, and if possible decode it to a normal xml without the encoding(?) for the tag and double quote

Any pointer would be appreciated
Best regards

Hi All

i just found out that, my xml payload which comes after the :

&lt;?xml version="1.0" encoding="UTF-8"?&gt;

is being parsed with my grok regex (the actual xml omitted for brevity)
it seems like because of the question mark (?) at the xml version tag which causes grok to failed parsing the whole line

how should i remove this exact phrase : &lt;?xml version="1.0" encoding="UTF-8"?&gt; from my parsed log in each line (if any) so that grok could parse the following line?

Best Regards

Hi all

i've been able to process the log message a little bit more correct, with the proper config file and grok expression, but still, there's some issue left

just in case if anyone interested i've put the details on logstash github issues here

Best Regards

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.