XML response from unstructured log

Hi Team,
I'm a beginner to logstash. Till now worked on configuring basic Input, Filebeats and Output plugins by following product documents which are available in the official website.

Now I have a real time scenario below,

I have an unstructured log file (abc.log) which is stored in the local folder, and it is the combination of Junk info and XML response in between. I need to extract only XML response from the log file through logstash pipeline.

Please guide me to achieve this.
Appreciate the quick response!

Thanks
Kishore K

If there is valid XML in the middle of a field an xml filter with store_xml set to true will find it and parse it. (xpath on the other hand will reject it.)

input { generator { count => 1 lines => [ '2020/01/02 08:40:16 here is some XML: <a><b>1</b><c>2</c></a> and more stuff' ] } }
filter { xml { source => "message" target => "theXML" } }
output { stdout { codec => rubydebug { metadata => false } } }

will get you

    "theXML" => {
    "c" => [
        [0] "2"
    ],
    "b" => [
        [0] "1"
    ]
},

Hi, Thanks for the prompt response.

I have small difficulty in my log file which is not extracting XML response. Below is the sample example of my log file.

2020/01/02 08:40:16 here is some XML: 
    <a>
    <b>1</b>
    <c>2</c>
    </a> 
and more stuff 
2020/01/02 08:40:16 here is some XML:
2020/01/02 08:40:16 here is some XML:
2020/01/02 08:40:16 here is some XML:
2020/01/02 08:40:16 here is some XML:
    <a>
    <b>1</b>
    <c>2</c>
    </a>

Above log file contains junk in between and XML response in separate lines.

When i combine XML in one line like (<a><b>1</b><c>2</c></a>), then the output is coming as like you mentioned, otherwise logstash is considering the single line as single event in the output.
It is very difficult to combine all the XML tags in single line manually since the log file is huge and having more number of XML responses in between.

Please guide me to achieve extracting XMLs from the above log file.

You will need to get each XML object into its own event. It may be possible to use a multiline filter that matches , such as

codec => multiline { pattern => '^\s*</a>' negate => true what => "next" auto_flush_interval => 1 }

or possibly match the date and accept that you need to drop {} the messages that do not contain XML

codec => multiline { pattern => '^[0-9/]{10} [0-9:]{8}' negate => true what => "previous" auto_flush_interval => 1 }

Thanks Badger, It got worked well.
Small clarification. Why <a> tag is not showing in 'theXML' object output. If we want to show that also, how can we bring that in 'theXML' object output.

That's just the way it works. The target contains the contents of the outermost XML object, not the outermost object itself.

Understood. But my question here is i have two different XML objects in the log file.
One is <Request> and other is <Response>. So i have to differentiate both XML objects.
Please guide me if there is any other way to bring in.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.