XML filter in logstash

Hi All,

I want to match the message name and value but both are linked with a variable p , how to extract each of the message name and match with corresponding value using the variable p with respective corresponding obj list.

below is the sample XML file .

<measData>
 <managedElement swVersion="CXP9024418_6 R67D23"/>
 <measInfo measInfoId="PM=1,PmGroup=FieldReplaceableUnit">
   <job jobId="PREDEF_Nc"/>
   <granPeriod duration="PT900S"
               endTime="2020-09-22T10:30:00+00:00"/>
   <repPeriod duration="PT900S"/>
   <measType p="1">pmPowerFailure</measType>
   <measType p="2">pmUnitTemperatureLevel</measType>
   <measValue measObjLdn="ManagedElement=UXJD6109,Equipment=1,FieldReplaceableUnit=1">
     <r p="1"> </r>
     <r p="2">3,3,3</r>
   </measValue>
   <measValue measObjLdn="ManagedElement=UXJD6109,Equipment=1,FieldReplaceableUnit=RRU-1">
     <r p="1">0</r>
     <r p="2"> , , </r>
   </measValue>
   <measValue measObjLdn="ManagedElement=UXJD6109,Equipment=1,FieldReplaceableUnit=RRU-2">
     <r p="1">0</r>
     <r p="2"> , , </r>
   </measValue>
   <measValue measObjLdn="ManagedElement=UXJD6109,Equipment=1,FieldReplaceableUnit=RRU-3">
     <r p="1">0</r>
     <r p="2"> , , </r>
   </measValue>
   <measValue measObjLdn="ManagedElement=UXJD6109,Equipment=1,FieldReplaceableUnit=RRU-4">
     <r p="1">0</r>
     <r p="2"> , , </r>
   </measValue>
   <measValue measObjLdn="ManagedElement=UXJD6109,Equipment=1,FieldReplaceableUnit=RRU-5">
     <r p="1">0</r>
     <r p="2"> , , </r>
   </measValue>
   <measValue measObjLdn="ManagedElement=UXJD6109,Equipment=1,FieldReplaceableUnit=RRU-6">
     <r p="1">0</r>
     <r p="2"> , , </r>
   </measValue>
 </measInfo>
 <measInfo measInfoId="PM=1,PmGroup=Climate">
   <job jobId="PREDEF_Apc"/>
   <granPeriod duration="PT900S"
               endTime="2020-09-22T10:30:00+00:00"/>
   <repPeriod duration="PT900S"/>
   <measType p="1">pmCabinetFanSpeed</measType>
   <measType p="2">pmCabinetFanSpeedExternal</measType>
   <measType p="3">pmCabinetTemperature</measType>
   <measType p="4">pmSpmBarometricAirPressure</measType>
   <measType p="5">pmSpmDifferentialAirPressure</measType>
   <measValue measObjLdn="ManagedElement=UXJD6109,EquipmentSupportFunction=1,Climate=1">
     <r p="1"> , , </r>
     <r p="2"> , , </r>
     <r p="3"> , , </r>
     <r p="4"> </r>
     <r p="5"> </r>
     <suspect>true</suspect>
   </measValue>
 </measInfo>
 <measInfo measInfoId="PM=1,PmGroup=PowerDistribution">
   <job jobId="PREDEF_Apc"/>
   <granPeriod duration="PT900S"
               endTime="2020-09-22T10:30:00+00:00"/>
   <repPeriod duration="PT900S"/>
   <measType p="1">pmSystemVoltage</measType>
   <measValue measObjLdn="ManagedElement=UXJD6109,EquipmentSupportFunction=1,PowerDistribution=1">
     <r p="1"> , , , , , , , , , , , , , , </r>
     <suspect>true</suspect>
   </measValue>
 </measInfo>
 <measInfo measInfoId="PM=1,PmGroup=PowerSupply">
   <job jobId="PREDEF_Apc"/>
   <granPeriod duration="PT900S"
               endTime="2020-09-22T10:30:00+00:00"/>
   <repPeriod duration="PT900S"/>
   <measType p="1">pmPsuAcInputVoltageInterruption</measType>
   <measType p="2">pmPsuPowerLoad</measType>
   <measValue measObjLdn="ManagedElement=UXJD6109,EquipmentSupportFunction=1,PowerSupply=1">
     <r p="1"> , , , , , , , , , </r>
     <r p="2"> , , , , , , , , , , , , , , </r>
     <suspect>true</suspect>
   </measValue>
 </measInfo>
 <measInfo measInfoId="PM=1,PmGroup=SupportUnit">
   <job jobId="PREDEF_Apc"/>
   <granPeriod duration="PT900S"
               endTime="2020-09-22T10:30:00+00:00"/>
   <repPeriod duration="PT900S"/>
   <measType p="1">pmFanSpeed</measType>
   <measValue measObjLdn="ManagedElement=UXJD6109,Equipment=1,SupportUnit=1">
     <r p="1">35,35,35</r>
   </measValue>
 </measInfo>
 <measInfo measInfoId="PM=1,PmGroup=EthernetPort">
   <job jobId="PREDEF_Rtn"/>
   <granPeriod duration="PT900S"
               endTime="2020-09-22T10:30:00+00:00"/>
   <repPeriod duration="PT900S"/>
   <measType p="1">ifHCInBroadcastPkts</measType>
   <measType p="2">ifHCInMulticastPkts</measType>
   <measType p="3">ifHCInOctets</measType>
   <measType p="4">ifHCInUcastPkts</measType>
   <measType p="5">ifHCOutBroadcastPkts</measType>
   <measType p="6">ifHCOutMulticastPkts</measType>
   <measType p="7">ifHCOutOctets</measType>
   <measType p="8">ifHCOutUcastPkts</measType>
   <measType p="9">ifInDiscards</measType>
   <measType p="10">ifInErrors</measType>
   <measType p="11">ifInUnknownProtos</measType>
   <measType p="12">ifInUnknownTags</measType>
   <measType p="13">ifOutDiscards</measType>
   <measType p="14">ifOutErrors</measType>
   <measValue measObjLdn="ManagedElement=UXJD6109,Transport=1,EthernetPort=TN_C">
     <r p="1">0</r>
     <r p="2">22502</r>
     <r p="3">1489889227</r>
     <r p="4">5952986</r>
     <r p="5">0</r>
     <r p="6">0</r>
     <r p="7">419797118</r>
     <r p="8">3685665</r>
     <r p="9">4</r>
     <r p="10">0</r>
     <r p="11">0</r>
     <r p="12">0</r>
     <r p="13">0</r>
     <r p="14">0</r>
   </measValue>
 </measInfo>
 <measInfo measInfoId="PM=1,PmGroup=InterfaceIPv4">
   <job jobId="PREDEF_Rtn"/>
   <granPeriod duration="PT900S"
               endTime="2020-09-22T10:30:00+00:00"/>
   <repPeriod duration="PT900S"/>
   <measType p="1">ipIfStatsHCInOctets</measType>
   <measType p="2">ipIfStatsHCInReceives</measType>
   <measType p="3">ipIfStatsHCOutOctets</measType>
   <measType p="4">ipIfStatsHCOutTransmits</measType>
   <measType p="5">ipIfStatsInAddrErrors</measType>
   <measType p="6">ipIfStatsInDiscards</measType>
   <measType p="7">ipIfStatsInHdrErrors</measType>
   <measType p="8">ipIfStatsInNoRoutes</measType>
   <measType p="9">ipIfStatsInTruncatedPkts</measType>
   <measType p="10">ipIfStatsInUnknownProtos</measType>
   <measValue measObjLdn="ManagedElement=UXJD6109,Transport=1,Router=vr_IUB,InterfaceIPv4=IUB">
     <r p="1">1348364673</r>
     <r p="2">5952899</r>
     <r p="3">335133181</r>
     <r p="4">3685515</r>
     <r p="5">0</r>
     <r p="6">0</r>
     <r p="7">0</r>
     <r p="8">0</r>
     <r p="9">0</r>
     <r p="10">0</r>
   </measValue>
   <measValue measObjLdn="ManagedElement=UXJD6109,Transport=1,Router=vr_MUB,InterfaceIPv4=MUB">
     <r p="1">30798</r>
     <r p="2">121</r>
     <r p="3">72144</r>
     <r p="4">166</r>
     <r p="5">0</r>
     <r p="6">0</r>
     <r p="7">0</r>
     <r p="8">0</r>
     <r p="9">0</r[spoiler]
<r p="10">0</r>```

That will require a lot of ruby code. The following is incomplete, and has no error checking. It never bothers to check that a field exists before indexing into it, so minor exceptions in the XML format will cause exceptions.

    xml { source => "message" target => "theXML" remove_field => [ "message" ] }
    ruby {
        code => '
            oldM = event.get("[theXML][measInfo]")
            newM = []

            oldM.each { |x|
                # Restructure each measInfo
                item = {}

                item["repPeriod"] = {}
                item["repPeriod"]["duration"] = x["repPeriod"][0]["duration"]

                types = x["measType"]
                oldValues = x["measValue"]
                newValues = []
                oldValues.each { |y|
                    value = {}
                    value["measObjLdn"] = y["measObjLdn"]
                    if y["suspect"]
                        value["suspect"] = y["suspect"][0]
                    end

                    newStuff = []
                    y["r"].each { |z|
                        someName = {}
                        index = z["p"].to_i - 1
                        type = types[index]["content"]
                        someName[type] = z["content"]
                        newStuff << someName
                    }
                    value["r"] = newStuff

                    newValues << value
                }
                item["measValue"] = newValues

                newM << item
            }
            event.set("measInfo", newM)
        '
    }

That will get you something like

  "measInfo" => [
    [0] {
        "repPeriod" => {
            "duration" => "PT900S"
        },
        "measValue" => [
            [0] {
                "measObjLdn" => "ManagedElement=UXJD6109,Equipment=1,FieldReplaceableUnit=1",
                         "r" => [
                    [0] {
                        "pmPowerFailure" => " "
                    },
                    [1] {
                        "pmUnitTemperatureLevel" => "3,3,3"
                    }
                ]
            },
            [1] {
                "measObjLdn" => "ManagedElement=UXJD6109,Equipment=1,FieldReplaceableUnit=RRU-1",
                         "r" => [
                    [0] {
                        "pmPowerFailure" => "0"
                    },
                    [1] {
                        "pmUnitTemperatureLevel" => " , , "
                    }
                ]
            },

the below is the config file added the code you have suggested but receiving ruby exception error.
Cant figure out where the error lies


input{
    file{
        path => "C:/elk/xml/sample_copy.xml"
        start_position => beginning
        sincedb_path => "NUL"
        }
    }
filter{
xml {   source => "message" 
        target => "theXML" 
        remove_field => [ "message" ] }
    ruby {
        code => 
            'oldM = event.get("[theXML][measInfo]")
            newM = []
            oldM.each { |x|
                # Restructure each measInfo
                item = {}
                item["repPeriod"] = {}
                item["repPeriod"]["duration"] = x["repPeriod"][0]["duration"]
                types = x["measType"]
                oldValues = x["measValue"]
                newValues = []
                oldValues.each { |y|
                    value = {}
                    value["measObjLdn"] = y["measObjLdn"]
                    if y["suspect"]
                        value["suspect"] = y["suspect"][0]
                    end
                    newStuff = []
                    y["r"].each { |z|
                        someName = {}
                        index = z["p"].to_i - 1
                        type = types[index]["content"]
                        someName[type] = z["content"]
                        newStuff << someName
                    }
                    value["r"] = newStuff
                    newValues << value
                }
                item["measValue"] = newValues
                newM << item
            }
            event.set("measInfo", newM)
          '
    }
}

output{
    elasticsearch{
        hosts => "localhost:9200"
        index => "sample_xml6"  
    }
    stdout{codec=>rubydebug}
} 

Below is the config file that i am trying to figure out .not able to load corresponding objvalu,msgtype and msgvalue into same array ,
when trying with ruby code as you suggested is returing nil value for [theXML][msgInfo]

input{
    file{
        path => "C:/elk/xml/sample_copy.xml"
        start_position => beginning
        sincedb_path => "NUL"
        }
    }
filter {
  xml {
        source => "message"
        remove_namespaces => true
        store_xml => true
        target => "parsed"
        force_array => true
    }
ruby {
        code => 
           'oldM = event.get("[message][measInfo]")
            event.set("messageinfo",oldM)
           '
}
if [message] =~ /<measType/{
mutate {
    copy => {"message"=>"measType"}
    add_field => {"Name" => "%{[parsed][content]}"}
    add_field => {"Key"=>"%{[parsed][p]}"}
    add_field =>{"new"=> "%{Name} = %{Key}"}
    remove_field => ['message']
    }   
}
if [message]=~ /<r p=/{
    mutate{
        copy => {"message"=>"messVal"}
        add_field => {"value"=>"%{[parsed][content]}"}
        add_field => {"Key_p"=>"%{[parsed][p]}"}
        remove_field =>["message"]
    }
}

if [message]=~/<measValue measObjLdn/{
    mutate {
        split => {'message'=>'"'}
        add_field => {'msgObj'=>"%{[message][1]}"}
        remove_field => ['message']
        split => {'msgObj' => ","}
        remove_field => ['tags']
    }
   
}
 if ("_xmlparsefailure" in [tags]) { drop {} }

ruby{
    code => "event.set('matching',[Hash['name',event.get('name'),'obj',event.get('msgObj'),'value',event.get('value')]])"
}

}
output{
    elasticsearch{
        hosts => "localhost:9200"
        index => "sample_xml5"
        
    }
    stdout{codec=>rubydebug}
}

and getting the below is the part of output


     "@timestamp" => 2020-09-30T10:42:45.766Z,
    "messageinfo" => nil,
       "matching" => [
        [0] {
             "name" => nil,
              "obj" => "ManagedElement=UXJD6109,Transport=1,Router=vr_MUB,InterfaceIPv4=MUB",
            "value" => nil
        }
    ],
           "path" => "C:/elk/xml/sample_copy.xml",
           "host" => "IN-00211416",
         "msgObj" => "ManagedElement=UXJD6109,Transport=1,Router=vr_MUB,InterfaceIPv4=MUB",
       "@version" => "1"
}
{
     "@timestamp" => 2020-09-30T10:42:45.722Z,
          "Key_p" => "1",
    "messageinfo" => nil,
       "matching" => [
        [0] {
             "name" => nil,
              "obj" => nil,
            "value" => " "
        }
    ],
           "path" => "C:/elk/xml/sample_copy.xml",
        "messVal" => "     <r p=\"1\"> </r>\r",
          "value" => " ",
           "host" => "IN-00211416",
         "parsed" => {
              "p" => "1",
        "content" => " "
    },
       "@version" => "1"
}
{
     "@timestamp" => 2020-09-30T10:42:45.727Z,
          "Key_p" => "1",
    "messageinfo" => nil,
       "matching" => [
        [0] {
             "name" => nil,
              "obj" => nil,
            "value" => "0"
        }
    ],
           "path" => "C:/elk/xml/sample_copy.xml",
        "messVal" => "     <r p=\"1\">0</r>\r",
          "value" => "0",
           "host" => "IN-00211416",
         "parsed" => {
              "p" => "1",
        "content" => "0"
    },
       "@version" => "1"
}
{
     "@timestamp" => 2020-09-30T10:42:45.729Z,
          "Key_p" => "1",
    "messageinfo" => nil,
       "matching" => [
        [0] {
             "name" => nil,
              "obj" => nil,
            "value" => "0"
        }
    ],
           "path" => "C:/elk/xml/sample_copy.xml",
        "messVal" => "     <r p=\"1\">0</r>\r",
          "value" => "0",
           "host" => "IN-00211416",
         "parsed" => {
              "p" => "1",
        "content" => "0"
    },
       "@version" => "1"
}
{
     "@timestamp" => 2020-09-30T10:42:45.732Z,
          "Key_p" => "1",
    "messageinfo" => nil,
       "matching" => [
        [0] {
             "name" => nil,
              "obj" => nil,
            "value" => "0"
        }
    ],
           "path" => "C:/elk/xml/sample_copy.xml",
        "messVal" => "     <r p=\"1\">0</r>\r",
          "value" => "0",
           "host" => "IN-00211416",
         "parsed" => {
              "p" => "1",
        "content" => "0"
    },
       "@version" => "1"
}
{
     "@timestamp" => 2020-09-30T10:42:45.737Z,
          "Key_p" => "2",
    "messageinfo" => nil,
       "matching" => [
        [0] {
             "name" => nil,
              "obj" => nil,
            "value" => " , , "
        }
    ],
           "path" => "C:/elk/xml/sample_copy.xml",
        "messVal" => "     <r p=\"2\"> , , </r>\r",
          "value" => " , , ",
           "host" => "IN-00211416",
         "parsed" => {
              "p" => "2",
        "content" => " , , "
    },
       "@version" => "1"
}
{
     "@timestamp" => 2020-09-30T10:42:45.754Z,
          "Key_p" => "4",
    "messageinfo" => nil,
       "matching" => [
        [0] {
             "name" => nil,
              "obj" => nil,
            "value" => "5952986"
        }
    ],
           "path" => "C:/elk/xml/sample_copy.xml",
        "messVal" => "     <r p=\"4\">5952986</r>\r",
          "value" => "5952986",
           "host" => "IN-00211416",
         "parsed" => {
              "p" => "4",
        "content" => "5952986"
    },
       "@version" => "1"
}
{
     "@timestamp" => 2020-09-30T10:42:45.757Z,
          "Key_p" => "12",
    "messageinfo" => nil,
       "matching" => [
        [0] {
             "name" => nil,
              "obj" => nil,
            "value" => "0"
        }
    ],
           "path" => "C:/elk/xml/sample_copy.xml",
        "messVal" => "     <r p=\"12\">0</r>\r",
          "value" => "0",
           "host" => "IN-00211416",
         "parsed" => {
              "p" => "12",
        "content" => "0"
    },
       "@version" => "1"
}
{
     "@timestamp" => 2020-09-30T10:42:45.764Z,
          "Key_p" => "4",
    "messageinfo" => nil,
       "matching" => [
        [0] {
             "name" => nil,
              "obj" => nil,
            "value" => "3685515"
        }
    ],
           "path" => "C:/elk/xml/sample_copy.xml",
        "messVal" => "     <r p=\"4\">3685515</r>\r",
          "value" => "3685515",
           "host" => "IN-00211416",
         "parsed" => {
              "p" => "4",
        "content" => "3685515"
    },
       "@version" => "1"
}
{
     "@timestamp" => 2020-09-30T10:42:45.768Z,
          "Key_p" => "8",
    "messageinfo" => nil,
       "matching" => [
        [0] {
             "name" => nil,
              "obj" => nil,
            "value" => "0"
        }
    ],
           "path" => "C:/elk/xml/sample_copy.xml",
        "messVal" => "     <r p=\"8\">0</r>\r",
          "value" => "0",
           "host" => "IN-00211416",
         "parsed" => {
              "p" => "8",
        "content" => "0"
    },
       "@version" => "1"
}
{
     "@timestamp" => 2020-09-30T10:42:45.743Z,
            "Key" => "2",
       "matching" => [
        [0] {
             "name" => nil,
              "obj" => nil,
            "value" => nil
        }
    ],
            "new" => "pmPsuPowerLoad = 2",
           "path" => "C:/elk/xml/sample_copy.xml",
           "Name" => "pmPsuPowerLoad",
       "measType" => "   <measType p=\"2\">pmPsuPowerLoad</measType>\r",
           "host" => "IN-00211416",
         "parsed" => {
              "p" => "2",
        "content" => "pmPsuPowerLoad"
    },
       "@version" => "1",
    "messageinfo" => nil
}
{
     "@timestamp" => 2020-09-30T10:42:45.750Z,
            "Key" => "3",
       "matching" => [
        [0] {
             "name" => nil,
              "obj" => nil,
            "value" => nil
        }
    ],
            "new" => "ifHCInOctets = 3",
           "path" => "C:/elk/xml/sample_copy.xml",
           "Name" => "ifHCInOctets",
       "measType" => "   <measType p=\"3\">ifHCInOctets</measType>\r",
           "host" => "IN-00211416",
         "parsed" => {
              "p" => "3",
        "content" => "ifHCInOctets"
    },
       "@version" => "1",
    "messageinfo" => nil
}
{
     "@timestamp" => 2020-09-30T10:42:45.752Z,
            "Key" => "11",
       "matching" => [
        [0] {
             "name" => nil,
              "obj" => nil,
            "value" => nil
        }
    ],
            "new" => "ifInUnknownProtos = 11",
           "path" => "C:/elk/xml/sample_copy.xml",
           "Name" => "ifInUnknownProtos",
       "measType" => "   <measType p=\"11\">ifInUnknownProtos</measType>\r",
           "host" => "IN-00211416",
         "parsed" => {
              "p" => "11",
        "content" => "ifInUnknownProtos"
    },
       "@version" => "1",
    "messageinfo" => nil
}
{
     "@timestamp" => 2020-09-30T10:42:45.762Z,
            "Key" => "7",
       "matching" => [
        [0] {
             "name" => nil,
              "obj" => nil,
            "value" => nil
        }
    ],
            "new" => "ipIfStatsInHdrErrors = 7",
           "path" => "C:/elk/xml/sample_copy.xml",
           "Name" => "ipIfStatsInHdrErrors",
       "measType" => "   <measType p=\"7\">ipIfStatsInHdrErrors</measType>\r",
           "host" => "IN-00211416",
         "parsed" => {
              "p" => "7",
        "content" => "ipIfStatsInHdrErrors"
    },
       "@version" => "1",
    "messageinfo" => nil
}
{
     "@timestamp" => 2020-09-30T10:42:45.734Z,
        "message" => "   <repPeriod duration=\"PT900S\"/>\r",
    "messageinfo" => nil,
       "matching" => [
        [0] {
             "name" => nil,
              "obj" => nil,
            "value" => nil
        }
    ],
           "path" => "C:/elk/xml/sample_copy.xml",
           "host" => "IN-00211416",
         "parsed" => {
        "duration" => "PT900S"
    },
       "@version" => "1"
}
{
     "@timestamp" => 2020-09-30T10:42:45.739Z,
        "message" => "   <job jobId=\"PREDEF_Apc\"/>\r",
    "messageinfo" => nil,
       "matching" => [
        [0] {
             "name" => nil,
              "obj" => nil,
            "value" => nil
        }
    ],
           "path" => "C:/elk/xml/sample_copy.xml",
           "host" => "IN-00211416",
         "parsed" => {
        "jobId" => "PREDEF_Apc"
    },
       "@version" => "1"
}
{
     "@timestamp" => 2020-09-30T10:42:45.745Z,
        "message" => "   <job jobId=\"PREDEF_Apc\"/>\r",
    "messageinfo" => nil,
       "matching" => [
        [0] {
             "name" => nil,
              "obj" => nil,
            "value" => nil
        }
    ],
           "path" => "C:/elk/xml/sample_copy.xml",
           "host" => "IN-00211416",
         "parsed" => {
        "jobId" => "PREDEF_Apc"
    },
       "@version" => "1"
}

That field does not exist. Did you mean [parsed][measInfo]?

Yes , I mean [parsed][measInfo]

Are you consuming the entire file as a single event or are you trying to parse each line separately?

It is not consuming as single event , I don't know if it is because of XML filter or not .
And I am not able to parse each line separately also .
If either of the methods is possible the I can use split and get the desired values into fields

If you consume the file one line at a time, which is what a file input does by default, then many of your lines will not be valid XML and the xml filter will not parse them. For example ...

<measInfo measInfoId="PM=1,PmGroup=FieldReplaceableUnit">

is not a valid XML document, since the opening measInfo tag is not closed.

If you want to consume the whole file as one event then this post shows you how to do it.

even after consuming the file as single one , as the xml file is too large , receiving multiline_codec error as max lines has crossed.

for smaller files the xpath splitting and naming is not working

OK, so change the multiline codec so that it picks up everything up to a line containing </measinfo>, then clean up the junk at the front

mutate { gsub => [ "message", ".*(<measInfo>)", "\1" ] }

Will try to do what you had suggested and let's see if desired output is coming