Parsing multiline, netsted XML using Logstash

Hello Guys,

I'm currently learning Logstash using one of Udemy's courses. While the course is currently focusing on file/http/stdin inputs, I would like to know whether it is possible to parse an xml presented below. This would help me asses the Logstash as one of the possible candidates for such operation.
So the xml looks more or less like this:

<?xml version="1.0" encoding="UTF-8"?>
<BATCH TIMESTAMP="2019-01-02T04:04:51.931+01:00" SOFTWARE="1.0" HARDWARE="1.0">
<PLANT NAME="PARIS" LINE="PRODLINE1" TESTER="TESTER1"/>
<PRODUCT ID="3" NAME="MOTHERBOARD" STATUS="OK">
	<GROUP ID="1" NAME="TESTGROUP1">
		<TEST ID="10" NAME="VOLTAGETEST" VALUE="2.34523" STATUS="OK"/>
		<TEST ID="20" NAME="INTEGRATIONTEST" VALUE="1" STATUS="NOK"/>
	</GROUP>
	<GROUP ID="2" NAME="TESTGROUP2">
		<TEST ID="10" NAME="VISUALTEST" VALUE="22" STATUS="OK"/>
	</GROUP>
</PRODUCT>
</BATCH>

This is the output I would like to receive in Elasticsearch (3 records in total):

{"BATCH_TIMESTAMP": "2019-01-02T04:04:51.931+01:00", "BATCH_SOFTWARE": "1.0", "BATCH_HARDWARE": "1.0", "PLANT_LINE": "PRODLINE1", "PLANT_TESTER": "TESTER1" "PRODUCT_ID": "3", "PRODUCT_NAME": "MOTHERBOARD", "GROUP_ID": "1", "GROUP_NAME": "TESTERGROUP1", "TEST_ID": "10", "TEST_NAME": "VOLTAGETEST", "TEST_VALUE": "2.34523", "TEST_STATUS": "OK"}
{"BATCH_TIMESTAMP": "2019-01-02T04:04:51.931+01:00", "BATCH_SOFTWARE": "1.0", "BATCH_HARDWARE": "1.0", "PLANT_LINE": "PRODLINE1", "PLANT_TESTER": "TESTER1" "PRODUCT_ID": "3", "PRODUCT_NAME": "MOTHERBOARD", "GROUP_ID": "1", "GROUP_NAME": "TESTERGROUP1", "TEST_ID": "20", "TEST_NAME": "INTEGRATIONTEST", "TEST_VALUE": "1", "TEST_STATUS": "NOK"}
{"BATCH_TIMESTAMP": "2019-01-02T04:04:51.931+01:00", "BATCH_SOFTWARE": "1.0", "BATCH_HARDWARE": "1.0", "PLANT_LINE": "PRODLINE1", "PLANT_TESTER": "TESTER1" "PRODUCT_ID": "3", "PRODUCT_NAME": "MOTHERBOARD", "GROUP_ID": "2", "GROUP_NAME": "TESTERGROUP1", "TEST_ID": "10", "TEST_NAME": "VISUALTEST", "TEST_VALUE": "22", "TEST_STATUS": "OK"}

Is that possible to achieve it without any programming skills? If so, could you give me a hint on which plugins/modules should I look at?
Thanks in advance!

yes you can parse this kind of log by using multline filter in input

if you're using filebeat by using to send log mean you can try this in filebeat like this,

multiline:
pattern: ' </BATCH>;'
negate: 'true'
match: before

Your XML is not valid XML. If I add a </PRODUCT> tag then, assuming you use a multiline codec to combine it into a single event, it can be parsed using

    xml { source => "message" target => "theXML" store_xml => true }
    split { field => "[theXML][PRODUCT][0][GROUP]" }
    split { field => "[theXML][PRODUCT][0][GROUP][TEST]" }

That will give you events that look like this:

    "theXML" => {
     "HARDWARE" => "1.0",
    "TIMESTAMP" => "2019-01-02T04:04:51.931+01:00",
     "SOFTWARE" => "1.0",
        "PLANT" => [
        [0] {
            "TESTER" => "TESTER1",
              "NAME" => "PARIS",
              "LINE" => "PRODLINE1"
        }
    ],
      "PRODUCT" => [
        [0] {
                "ID" => "3",
              "NAME" => "MOTHERBOARD",
             "GROUP" => {
                "TEST" => {
                    "STATUS" => "OK",
                        "ID" => "10",
                      "NAME" => "VISUALTEST",
                     "VALUE" => "22"
                },
                  "ID" => "2",
                "NAME" => "TESTGROUP2"
            },
            "STATUS" => "OK"
        }
    ]
}

Then you can use mutate+rename to move the fields around any way you want.

1 Like

Thank you. My intention is not to use filebeat, and to rely only on Logstash pulls from the given file location.

You are correct, my apologies for the confusion. I edited my initial post and closed PRODUCT tag.
Thanks for the proposal, I will take a look at it :slight_smile:

You can use the multiline function in logstash also in the input section.

Will check it out, thanks!

codec => multiline {
pattern => "</BATCH>"
negate => true
what => "previous"
}

like this you can use in logstash input plugin

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.