Parse inconsistent XML trace using logstash

I am very new to ELK , need your guidance in my below usecase development.

We have lot of XML files generated from source server(around 55GB per day )..but, from each file, we need only very few lines.

There are 2 parameters(CO_ID_PUB,DIRNUM) are of interest in each of the XML trace containing transactions.

But, each of the transaction may or may not contain both of them. For example:

  1. Both CO_ID_PUB,DIRNUM are available
<COMMAND name="AAA.READ" timestamp="1574339699569" so="103">
<SVLOBJECT>
            <STRING name="CO_ID_PUB" val="FFFFFFF"/>
        </SVLOBJECT>
        <RESULT>
          ...
          <STRING name="DIRNUM" val="FFFFFFF"/>
          ....
</COMMAND>
<TX_COMMIT timestamp="1574339699585" so="103"/>
  1. Only CO_ID_PUB available
<COMMAND name="BBB.READ" timestamp="1574339699569" so="103">
<SVLOBJECT>
            <STRING name="CO_ID_PUB" val="FFFFFFF"/>
        </SVLOBJECT>
        <RESULT>
          ...
          ....
</COMMAND>
<TX_COMMIT timestamp="1574339699585" so="103"/>
  1. Only DIRNUM available
<COMMAND name="CCC.READ" timestamp="1574339699569" so="103">
<SVLOBJECT>
            <STRING name="DIRNUM" val="FFFFFFF"/>
        </SVLOBJECT>
        <RESULT>
          ...
          ....
</COMMAND>
<TX_COMMIT timestamp="1574339699585" so="103"/>
  1. Neither CO_ID_PUB,DIRNUM are not available

Each transaction starting with this below pattern:

<COMMAND name="AAA.READ" timestamp="1574339699569" so="103">
<SVLOBJECT>
            <STRING name="CO_ID_PUB" val="FFFFFFF"/>
        </SVLOBJECT>
        <RESULT>
....
</COMMAND>
<TX_COMMIT timestamp="1574339699585" so="103"/>

but as i mentioned in the 4 scenarios, only if the below 2 fields present, then that particular transaction data such as start timestamp & end timestamp to be stored in ES.

Rest of the XML fields need to be dropped.

... ...

Then, i need to store data like below:

{ CO_ID_PUB:FFFFFFF, DIRNUM:KKKKK, StartTime:1574339699569 EndTime:1574339699585 name:AAA.READ} or

{ DIRNUM:FFFFFFF, DIRNUM:KKKKK, StartTime:1574339699569 EndTime:1574339699585 name:AAA.READ}

Once this data stored, user will query to get the transaction response time by providing CO_ID_PUB or DIRNUM

Can you help me with which logstash filter(xml or grok) and logic to develop parser ?

Assuming that the <RESULT> tags are closed, and that the entire XML is wrapped in a single element (an XML filter will not parse multiple roots) so that your XML items look something like

   "message" => "<a> <COMMAND name=\"BBB.READ\" timestamp=\"1574339699569\" so=\"103\"> <SVLOBJECT> <STRING name=\"CO_ID_PUB\" val=\"FFFFFFF\"/> </SVLOBJECT> <RESULT> </RESULT> </COMMAND> <TX_COMMIT timestamp=\"1574339699585\" so=\"103\"/> </a>",

then you could parse with an XML filter and use ruby to extract the elements you want. I do not know enough about xpath to say whether it could be done using that.

    xml {
        source => "message"
        store_xml => true
        target => "[@metadata][theXML]"
    }
    ruby {
        code => '
            xml = event.get("[@metadata][theXML]")
            if xml
                svl = xml["COMMAND"][0]["SVLOBJECT"][0]
                res = xml["COMMAND"][0]["RESULT"][0]
                if res["STRING"] && res["STRING"][0]["name"] == "DIRNUM"
                    event.set("DIRNUM", xml["COMMAND"][0]["RESULT"][0]["STRING"][0]["val"])
                end
                if svl["STRING"] && svl["STRING"][0]["name"] == "DIRNUM"
                    event.set("DIRNUM", xml["COMMAND"][0]["SVLOBJECT"][0]["STRING"][0]["val"])
                end
                if svl["STRING"] && svl["STRING"][0]["name"] == "CO_ID_PUB"
                    event.set("CO_ID_PUB", xml["COMMAND"][0]["SVLOBJECT"][0]["STRING"][0]["val"])
                end
            end
        '
    }

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.