Hello All,
I'm learning Logstash basics and I'm trying to find a way of converting an .xml file into multiple events (separate json files) which will be sent out to elasticsearch. The thing is, specific elements of xml might vary in terms of content. One event might have test info, the other one not, etc. More details below. Each .xml consists of two lines, one xml metadata and one content line. I'm interested only in content lines. Here's source file example:
<?xml version="1.0" encoding="UTF-8"?>
<BATCH TIMESTAMP="2019-01-02T04:04:51.931+01:00" SOFTWARE="1.0"><PLANT NAME="PARIS" LINE="PRODLINE1"/><PRODUCT ID="3" NAME="MOTHERBOARD"><GROUP ID="1" NAME="TESTGROUP1"><TEST ID="10" NAME="VOLTAGETEST" VALUE="2.34523" STATUS="OK"/><TEST ID="20" NAME="INTEGRATION" VALUE="1.00000" STATUS="NOK"/></GROUP><GROUP ID="2" NAME="CHANGEOVER">DONE</GROUP><GROUP ID="3" NAME="NOTIFICATION"><EXTRA TEXT="OK"/></GROUP><GROUP ID="4" NAME="VOLTAGE_NOTIF"><TEST ID="10" NAME="VOLTAGETEST" VALUE="4.00001" STATUS="NOK"/><EXTRA TEXT="OK"/></GROUP></PRODUCT></BATCH>
Formatted for better readability:
<?xml version="1.0" encoding="UTF-8"?>
<BATCH TIMESTAMP="2019-01-02T04:04:51.931+01:00" SOFTWARE="1.0">
<PLANT NAME="PARIS" LINE="PRODLINE1"/>
<PRODUCT ID="3" NAME="MOTHERBOARD">
<GROUP ID="1" NAME="TESTGROUP1">
<TEST ID="10" NAME="VOLTAGETEST" VALUE="2.34523" STATUS="OK"/>
<TEST ID="20" NAME="INTEGRATION" VALUE="1.00000" STATUS="NOK"/>
</GROUP>
<GROUP ID="2" NAME="CHANGEOVER">DONE</GROUP>
<GROUP ID="3" NAME="NOTIFICATION">
<EXTRA TEXT="OK"/>
</GROUP>
<GROUP ID="4" NAME="VOLTAGE_NOTIF">
<TEST ID="10" NAME="VOLTAGETEST" VALUE="4.00001" STATUS="NOK"/>
<EXTRA TEXT="OK"/>
</GROUP>
</PRODUCT>
</BATCH>
And here's desired results - 5 separate events:
{"BATCH_TIMESTAMP": "2019-01-02T04:04:51.931+01:00", "BATCH_SOFTWARE": "1.0", "PLANT_NAME": "PARIS", "PLANT_LINE": "PRODLINE1", "PRODUCT_ID": "3", "PRODUCT_NAME": "MOTHERBOARD", "GROUP_ID": "1", "GROUP_NAME": "TESTERGROUP1", "GROUP_VALUE": "", "TEST_ID": "10", "TEST_NAME": "VOLTAGETEST", "TEST_VALUE": "2.34523", "TEST_STATUS": "OK", "EXTRA_TEXT": ""}
{"BATCH_TIMESTAMP": "2019-01-02T04:04:51.931+01:00", "BATCH_SOFTWARE": "1.0", "PLANT_NAME": "PARIS", "PLANT_LINE": "PRODLINE1", "PRODUCT_ID": "3", "PRODUCT_NAME": "MOTHERBOARD", "GROUP_ID": "1", "GROUP_NAME": "TESTERGROUP1", "GROUP_VALUE": "", "TEST_ID": "20", "TEST_NAME": "INTEGRATION", "TEST_VALUE": "1.00000", "TEST_STATUS": "NOK", "EXTRA_TEXT": ""}
{"BATCH_TIMESTAMP": "2019-01-02T04:04:51.931+01:00", "BATCH_SOFTWARE": "1.0", "PLANT_NAME": "PARIS", "PLANT_LINE": "PRODLINE1", "PRODUCT_ID": "3", "PRODUCT_NAME": "MOTHERBOARD", "GROUP_ID": "2", "GROUP_NAME": "CHANGEOVER", "GROUP_VALUE": "DONE", "TEST_ID": "", "TEST_NAME": "", "TEST_VALUE": "", "TEST_STATUS": "", "EXTRA_TEXT": ""}
{"BATCH_TIMESTAMP": "2019-01-02T04:04:51.931+01:00", "BATCH_SOFTWARE": "1.0", "PLANT_NAME": "PARIS", "PLANT_LINE": "PRODLINE1", "PRODUCT_ID": "3", "PRODUCT_NAME": "MOTHERBOARD", "GROUP_ID": "3", "GROUP_NAME": "NOTIFICATION", "GROUP_VALUE": "", "TEST_ID": "", "TEST_NAME": "", "TEST_VALUE": "", "TEST_STATUS": "", "EXTRA_TEXT": "OK"}
{"BATCH_TIMESTAMP": "2019-01-02T04:04:51.931+01:00", "BATCH_SOFTWARE": "1.0", "PLANT_NAME": "PARIS", "PLANT_LINE": "PRODLINE1", "PRODUCT_ID": "3", "PRODUCT_NAME": "MOTHERBOARD", "GROUP_ID": "4", "GROUP_NAME": "VOLTAGE_NOTIF", "GROUP_VALUE": "", "TEST_ID": "10", "TEST_NAME": "VOLTAGETEST", "TEST_VALUE": "4.00001", "TEST_STATUS": "NOK", "EXTRA_TEXT": "OK"}
Formatted and explained below:
- Event 1 (Group 1, Test 10, No Extra Text):
{
"BATCH_TIMESTAMP":"2019-01-02T04:04:51.931+01:00",
"BATCH_SOFTWARE":"1.0",
"PLANT_NAME":"PARIS",
"PLANT_LINE":"PRODLINE1",
"PRODUCT_ID":"3",
"PRODUCT_NAME":"MOTHERBOARD",
"GROUP_ID":"1",
"GROUP_NAME":"TESTERGROUP1",
"GROUP_VALUE":"",
"TEST_ID":"10",
"TEST_NAME":"VOLTAGETEST",
"TEST_VALUE":"2.34523",
"TEST_STATUS":"OK",
"EXTRA_TEXT":""
}
- Event 2 (Group 1, Test 20, No Extra Text):
{
"BATCH_TIMESTAMP":"2019-01-02T04:04:51.931+01:00",
"BATCH_SOFTWARE":"1.0",
"PLANT_NAME":"PARIS",
"PLANT_LINE":"PRODLINE1",
"PRODUCT_ID":"3",
"PRODUCT_NAME":"MOTHERBOARD",
"GROUP_ID":"1",
"GROUP_NAME":"TESTERGROUP1",
"GROUP_VALUE":"",
"TEST_ID":"20",
"TEST_NAME":"INTEGRATION",
"TEST_VALUE":"1.00000",
"TEST_STATUS":"NOK",
"EXTRA_TEXT":""
}
- Event 3 (Group 2, No Test, No Extra Text):
{
"BATCH_TIMESTAMP":"2019-01-02T04:04:51.931+01:00",
"BATCH_SOFTWARE":"1.0",
"PLANT_NAME":"PARIS",
"PLANT_LINE":"PRODLINE1",
"PRODUCT_ID":"3",
"PRODUCT_NAME":"MOTHERBOARD",
"GROUP_ID":"2",
"GROUP_NAME":"CHANGEOVER",
"GROUP_VALUE":"DONE",
"TEST_ID":"",
"TEST_NAME":"",
"TEST_VALUE":"",
"TEST_STATUS":"",
"EXTRA_TEXT":""
}
- Event 4 (Group 3, No Text, Extra Text Present):
{
"BATCH_TIMESTAMP":"2019-01-02T04:04:51.931+01:00",
"BATCH_SOFTWARE":"1.0",
"PLANT_NAME":"PARIS",
"PLANT_LINE":"PRODLINE1",
"PRODUCT_ID":"3",
"PRODUCT_NAME":"MOTHERBOARD",
"GROUP_ID":"3",
"GROUP_NAME":"NOTIFICATION",
"GROUP_VALUE":"",
"TEST_ID":"",
"TEST_NAME":"",
"TEST_VALUE":"",
"TEST_STATUS":"",
"EXTRA_TEXT":"OK"
}
- Event 5 (Group 4, Test 10, Extra Text Present):
{
"BATCH_TIMESTAMP":"2019-01-02T04:04:51.931+01:00",
"BATCH_SOFTWARE":"1.0",
"PLANT_NAME":"PARIS",
"PLANT_LINE":"PRODLINE1",
"PRODUCT_ID":"3",
"PRODUCT_NAME":"MOTHERBOARD",
"GROUP_ID":"4",
"GROUP_NAME":"VOLTAGE_NOTIF",
"GROUP_VALUE":"",
"TEST_ID":"10",
"TEST_NAME":"VOLTAGETEST",
"TEST_VALUE":"4.00001",
"TEST_STATUS":"NOK",
"EXTRA_TEXT":"OK"
}
I'm trying to figure it out using both xml and split filters, but no success as of yet. I'd appreciate the suggestions especially on how to handle different structures of event if they are not present in the source like in Event 3 (tests, extra text).
Thanks in advance!