Hi All
I'm getting started with using elk stack, i need to configure logstash to be able to parse this logfile correctly, the sample data is something like this :
####<Mar 6, 2020 10:37:41 PM WIB> <> <> <1583509061697> <785592> <ID:<452274.1583509058348.0>> <> <jms_module!Queue_Name> <Action> <user> <Session_identifier> <<the encoded xml>>
now, using grok, i was able to parse the entire line correctly, using these pattern :
####%{PATTERN:sysdate} %{PATTERN:uniquie_id} %{PATTERN:thread_id} %{PATTERN:another_id} %{PATTERN:queue_name} %{PATTERN:action} %{PATTERN:user} %{PATTERN:session} %{PATTERN:xml}
PATTERN (<[^?]*>)
except for the <<the encoded xml>>
parts (Grok Debugger responded with : Provided Grok patterns do not match data in the input)
because, unlike normal xml, which is supposed to be something like this :
<?xml version="1.0" encoding="UTF-8"?>
it is actually written like this in the log file:
<?xml version="1.0" encoding="UTF-8"?>
this encoding(?) applies exactly only for these character : < (opening angle bracket), > (closing angle bracket), and " (double quote)
which translate to these respectively :
<
to <
>
to >
"
to "
i've searched the documentation and discussion to find any clue on how to normalise these kinds of case, but got no luck.
what i need is simply get <<the encoded xml>>
as a string, and if possible decode it to a normal xml without the encoding(?) for the tag and double quote
Any pointer would be appreciated
Best regards