How to parase xml in logstash

Hi I'm wondering how can I can parse below structure of XML in logstash as XML in the single event. Maybe in that case would be use fluentd.???

This logs will be upload by filebeat.

here it is worth mentioning that the objects which are in measinfo field have measType values which change dynamically depending on measInfo measInfoId= object. I would like help in writing a structure that will use a multiline codec or some other way of breaking down the data for loading into elastic.

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="MeasDataCollection.xsl"?>
<measCollecFile xmlns="http://www.3gpp.org/ftp/specs/archive/32_series/32.435#measCollec"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.3gpp.org/ftp/specs/archive/32_series/32.435#measCollec
http://www.3gpp.org/ftp/specs/archive/32_series/32.435#measCollec">
 
 <fileHeader fileFormatVersion="32.435 V7.0" vendorName="Company NN" dnPrefix="DC=a1.companyNN.com,SubNetwork=1,IRPAgent=1">
 <fileSender localDn="SubNetwork=CountryNN,MeContext=MEC-Gbg-1,ManagedElement=RNC-Gbg-1" elementType="RNC"/>
 <measCollec beginTime="2000-03-01T14:00:00+02:00"/>
 </fileHeader>
 <measData>
 <managedElement localDn="SubNetwork=CountryNN,MeContext=MEC-Gbg-1,ManagedElement=RNC-Gbg-1" userLabel="RNC Telecomville"/>
 <measInfo measInfoId="Node1">
 <job jobId="1231"/>
 <granPeriod duration="PT900S" endTime="2000-03-01T14:14:30+02:00"/>
 <repPeriod duration="PT1800S"/>
 <measType p="1">attTCHSeizures</measType>
 <measType p="2">succTCHSeizures</measType>
 <measType p="3">attImmediateAssignProcs</measType>
 <measType p="4">succImmediateAssignProcs</measType>
 <measValue measObjLdn="RncFunction=RF-1,UtranCell=Gbg-997">
 <r p="1">234</r>
 <r p="2">345</r>
 <r p="3">567</r>
 <r p="4">789</r>
 </measValue>
 <measValue measObjLdn="RncFunction=RF-1,UtranCell=Gbg-998">
 <r p="1">890</r>
 <r p="2">901</r>
 <r p="3">123</r>
 <r p="4">234</r>
 </measValue>
 <measValue measObjLdn="RncFunction=RF-1,UtranCell=Gbg-999">
 <r p="1">456</r>
 <r p="2">567</r>
 <r p="3">678</r>
 <r p="4">789</r>
 <suspect>true</suspect>
 </measValue>
 </measInfo>
 <measInfo measInfoId="Node2">
 <job jobId="1232"/>
 <granPeriod duration="PT1000s" endTime="2000-03-01T14:14:30+02:00"/>
 <repPeriod duration="PT1000S"/>
 <measType p="1">attTCHSeizures2</measType>
 <measType p="2">succTCHSeizures2</measType>
 <measType p="3">attImmediateAssignProcs2</measType>
 <measType p="4">succImmediateAssignProcs2</measType>
 <measValue measObjLdn="RncFunction=RF-1,UtranCell=Gbg-1000">
 <r p="1">234</r>
 <r p="2">345</r>
 <r p="3">567</r>
 <r p="4">789</r>
 </measValue>
 <measValue measObjLdn="RncFunction=RF-1,UtranCell=Gbg-1001">
 <r p="1">890</r>
 <r p="2">901</r>
 <r p="3">123</r>
 <r p="4">234</r>
 </measValue>
 <measValue measObjLdn="RncFunction=RF-1,UtranCell=Gbg-1002">
 <r p="1">456</r>
 <r p="2">567</r>
 <r p="3">678</r>
 <r p="4">789</r>
 <suspect>true</suspect>
 </measValue>
 </measInfo>
 </measData>
 <fileFooter>
 <measCollec endTime="2000-03-01T14:15:00+02:00"/>
 </fileFooter>
</measCollecFile> 

There are quite a few threads about this in this forum. Here is an example of parsing it using an xml filter and reformatting the data.

Many thanks for pointed out then I will follow by that case. Please don't close this topic, if I have any doubts/questions I will shoot here

Ok, so I have read many of topics and tried to merge the part of code for parsing my xml file but unfortunately I've met some problems.
I don't have much experience in creating such configurations. I based it on a piece of your code from "Parsing measurement xml data using logstash - #3 by yagoza " . It does not work as intended. Could you keep on eye for output as well as in config file itself?

here You have files:
https://drive.google.com/drive/folders/1QLvwKP_A1VPdw89SL_GCTEp_-3cuL8D1?usp=sharing

If you want to share files then share them so that everyone can see them.

sure updated now link works for everyone

Can You download now these files?

I can download it, I just cannot open a .7z file.

so all needed files were uploaded without archive,

OK, so you have a five megabyte XML object. I read that in as a single event using a file input with a multiline codec. I had to add max_lines => 300000 max_bytes => 6000000 to the codec. I parse the XML using

xml { source => "message" target => "theXML" store_xml => true remove_field => [ "message" ] }

Whenever I can I use force_array => false but there are elements that have a single entry in some places and multiple entries in others. We need these to have identicial structure, so we will keep both as arrays.

A five megabyte structure is expensive to process. So I deleted all lines from the file that contains p="[0-9][0-9]" or p="[0-9][0-9][0-9]" which removes 90% of the data but retains the basic structure. It just means arrays contain ten entries instead of hundreds.

Looking at the XML structure, [theXML] contains [xmlns], [fileHeader], [measData], and [fileFooter] objects. [measData] is an array of hashes. One entry in those hashes is [measInfo], which is also an array of hashes. Stripping away more entries, we have

      "measData" => [
        [0] {
                  "measInfo" => [
                [ 0] {
                    "measInfoId" => "DiaNode",
                     "measValue" => [
                        [0] {
                            "measObjLdn" => "DiaNode=NODE21fe.epc.mnc002.mcc260.3gppnetwork.org",
                                     "r" => [
                                [0] {
                                          "p" => "1",
                                    "content" => "0"
                                },
                                [1] {
                                          "p" => "2",
                                    "content" => "12"
                                } ...
                            ]
                        }
                    ],
                      "measType" => [
                        [0] {
                                  "p" => "1",
                            "content" => "Diameter.EgressAnswMsg.Info"
                        },
                        [1] {
                                  "p" => "2",
                            "content" => "Diameter.EgressAnswMsg.PermanentFailure"
                        } ...
                    ],
                    "granPeriod" => [
                        [0] {
                            "duration" => "PT300S",
                             "endTime" => "2021-09-15T15:05:00+02:00"
                        }
                    ],
                     "repPeriod" => [
                        [0] {
                            "duration" => "PT300S"
                        }
                    ],
                           "job" => [
                        [0] {
                            "jobId" => "NODE_System_SysDef_NOOSSCONTROL"
                        }
                    ]
                } ...

I want to join the measType "content" with the measValue "content" to make hash entries. [measValue], instead of being an array of hashes will be a hash with "measObjLdn" as the key and the value being a hash of the joined Type/Value pairs.

The code I tried was

    ruby {
        code => '
            measData = event.get("[theXML][measData]")
            newMeasData = {}

            # measData contains a managedElement object and an array of measInfo
            newMeasData["managedElement"] = measData[0]["managedElement"]
            measData[0]["measInfo"].each { |x|
                # Process a measInfo hash
                newMeasInfo = {}
                newMeasInfo["granPeriod"] = x["granPeriod"][0]
                newMeasInfo["repPeriod"] = x["repPeriod"][0]
                newMeasInfo["job"] = x["job"][0]

                # Convert measType array into a hash we can refer to with the "p" value
                measType = {}
                x["measType"].each { |y|
                    measType[y["p"]] = y["content"]
                }

                # Now work through the measValue array and do the joins
                newMeasValue = {}
                x["measValue"].each { |y|
                    dataHash = {}
                    y["r"].each { |z|
                        dataHash[measType[z["p"]]] = z["content"]
                    }
                    newMeasInfo[y["measObjLdn"]] = dataHash
                }
                newMeasData[x["measInfoId"]] = newMeasInfo
            }
            event.set("measData", newMeasData)
        '
    }
    mutate { remove_field => [ "theXML" ] }

Note that there are no arrays in result, everything is a hash, which is easier to reference in Elasticsearch than an unsplit array. That said, I am unconvinced that the result is useful.

When you look at the data that this produces, it is not good. Consider the OsmPI data, which starts off with

                     "OsmPI" => {
                                        "LemService=OSMonitor,OsmPU=PL-8,OsmPT=vDicosVMHelper,OsmPI=7" => {
            "VS.LEM.CPULoad.Total" => "0"
        },
                                    "LemService=OSMonitor,OsmPU=PL-8,OsmPT=CDCLSvLogRotator,OsmPI=vm1" => {
            "VS.LEM.CPULoad.Total" => "0"
        },
                                    "LemService=OSMonitor,OsmPU=PL-4,OsmPT=CDCLSvLogRotator,OsmPI=vm4" => {
            "VS.LEM.CPULoad.Total" => "0"
        },

Obviously that would be better as a table that has a LemService column and a CPULoad column. Instead I have created a new column in the database for each row of data. It is beyond bonkers. So, I am now going to consider a different approach and see if we can retain some of the arrays and use split filters to generate multiple events that look more similar.

The following code

    ruby {
        code => '
            measData = event.get("[theXML][measData]")
            newMeasData = []

            # measData contains a managedElement object and an array of measInfo
            measData[0]["measInfo"].each { |x|
                # Process a measInfo hash
                newMeasInfo = []

                # Convert measType array into a hash we can refer to with the "p" value
                measType = {}
                x["measType"].each { |y|
                    measType[y["p"]] = y["content"]
                }

                # Now work through the measValue array and do the joins
                newMeasValue = {}
                x["measValue"].each { |y|
                    dataHash = {}
                    y["r"].each { |z|
                        dataHash[measType[z["p"]]] = z["content"]
                    }
                    dataHash["measInfoId"] = y["measObjLdn"]

                    #dataHash["managedElement"] = measData[0]["managedElement"][0]

                    #dataHash["granPeriod"] = x["granPeriod"][0]
                    #dataHash["repPeriod"] = x["repPeriod"][0]
                    #dataHash["job"] = x["job"][0]

                    newMeasInfo << dataHash
                }
                newMeasData << newMeasInfo
            }
            event.set("measData", newMeasData)
        '
        remove_field => [ "theXML" ]
    }

    split { field => "measData" }
    split { field => "measData" }

will produce measData fields like

  "measData" => {
              "measInfoId" => "OsmPI",
              "measObjLdn" => "LemService=OSMonitor,OsmPU=PL-3,OsmPT=CDCLSvLogRotator,OsmPI=vm14",
    "VS.LEM.CPULoad.Total" => "0"
},

or

  "measData" => {
        "VS.LPM.LoadReg.Reject.Rate.Heap" => "0",
         "VS.LPM.LoadReg.Reject.Rate.Mem" => "0",
                             "measInfoId" => "VdVM",
    "VS.LPM.LoadReg.Reject.Rate.Tipc.Out" => "0",
        "VS.LPM.LoadReg.Reject.MultiMMap" => "0",
     "VS.LPM.LoadReg.Reject.Rate.Tipc.In" => "0",
         "VS.LPM.LoadReg.Reject.Rate.CPU" => "0",
                             "measObjLdn" => "LpmService=LPMSv,LpmPU=PL-5,VdVM=12",
            "VS.LPM.LoadReg.Reject.Total" => "0"
},

Uncomment the additional assignments to all the data in every event.

However, this is far more memory intensive that the first approach.

Note that the code contains no error checking, so small changes in the data format may cause logstash to crash.

1 Like

So I've tried these two approach. I'm very appreciate You.
From the first one

input {
    file {
        path =>"/data/A20210915.1500+0200-1505+0200_node21.xml"
        codec => multiline { pattern => "</measInfo>" negate => true what => next max_lines => 300000 max_bytes => 6000000 auto_flush_interval => 1 }
        start_position => "beginning"
        type => "xml"

                }
}

filter {

        # if [message] =~ /<?xml/ { drop {} }





mutate {
                 gsub => [ "message", "[<]measCollecFile[^>]*[>]", "" ]
                 gsub => [ "message", "^[^_]*<measData(.)$", "" ]
                 gsub => [ "message", "[<]managedElement[^>]*[>]", "" ]
                 gsub => [ "message", "[^_]\/measData[^_]*$", "" ]
                 }


xml { source => "message" target => "theXML" store_xml => true remove_field => [ "message" ] }


    ruby {
        code => '
            measData = event.get("[theXML][measData]")
            newMeasData = {}

            # measData contains a managedElement object and an array of measInfo
            newMeasData["managedElement"] = measData[0]["managedElement"]
            measData[0]["measInfo"].each { |x|
                # Process a measInfo hash
                newMeasInfo = {}
                newMeasInfo["granPeriod"] = x["granPeriod"][0]
                newMeasInfo["repPeriod"] = x["repPeriod"][0]
                newMeasInfo["job"] = x["job"][0]

                # Convert measType array into a hash we can refer to with the "p" value
                measType = {}
                x["measType"].each { |y|
                    measType[y["p"]] = y["content"]
                }

                # Now work through the measValue array and do the joins
                newMeasValue = {}
                x["measValue"].each { |y|
                    dataHash = {}
                    y["r"].each { |z|
                        dataHash[measType[z["p"]]] = z["content"]
                    }
                    newMeasInfo[y["measObjLdn"]] = dataHash
                }
                newMeasData[x["measInfoId"]] = newMeasInfo
            }
            event.set("measData", newMeasData)
        '
    }
    mutate { remove_field => [ "theXML" ] }

output {
    stdout { codec=>rubydebug }
}

output

[2021-09-28T23:49:18,737][INFO ][logstash.agent           ] Successfully started Logstash API endpoint {:port=>9600}
[2021-09-28T23:49:19,469][ERROR][logstash.agent           ] Failed to execute action {:action=>LogStash::PipelineAction::Create/pipeline_id:main, :exception=>"LogStash::ConfigurationError", :message=>"Expected one of [ \\t\\r\\n], \"#\", \"=>\" at line 67, column 12 (byte 2163) after filter {\n\n        # if [message] =~ /<?xml/ { drop {} }\n\n\n\n\n\nmutate {\n                 gsub => [ \"message\", \"[<]measCollecFile[^>]*[>]\", \"\" ]\n                 gsub => [ \"message\", \"^[^_]*<measData(.)$\", \"\" ]\n                 gsub => [ \"message\", \"[<]managedElement[^>]*[>]\", \"\" ]\n                 gsub => [ \"message\", \"[^_]\\/measData[^_]*$\", \"\" ]\n                 }\n\n\nxml { source => \"message\" target => \"theXML\" store_xml => true remove_field => [ \"message\" ] }\n\n\n    ruby {\n        code => '\n            measData = event.get(\"[theXML][measData]\")\n            newMeasData = {}\n\n            # measData contains a managedElement object and an array of measInfo\n            newMeasData[\"managedElement\"] = measData[0][\"managedElement\"]\n            measData[0][\"measInfo\"].each { |x|\n                # Process a measInfo hash\n                newMeasInfo = {}\n                newMeasInfo[\"granPeriod\"] = x[\"granPeriod\"][0]\n                newMeasInfo[\"repPeriod\"] = x[\"repPeriod\"][0]\n                newMeasInfo[\"job\"] = x[\"job\"][0]\n\n                # Convert measType array into a hash we can refer to with the \"p\" value\n                measType = {}\n                x[\"measType\"].each { |y|\n                    measType[y[\"p\"]] = y[\"content\"]\n                }\n\n                # Now work through the measValue array and do the joins\n                newMeasValue = {}\n                x[\"measValue\"].each { |y|\n                    dataHash = {}\n                    y[\"r\"].each { |z|\n                        dataHash[measType[z[\"p\"]]] = z[\"content\"]\n                    }\n                    newMeasInfo[y[\"measObjLdn\"]] = dataHash\n                }\n                newMeasData[x[\"measInfoId\"]] = newMeasInfo\n            }\n            event.set(\"measData\", newMeasData)\n        '\n    }\n    mutate { remove_field => [ \"theXML\" ] }\n\noutput {\n    stdout ", :backtrace=>["/usr/share/logstash/logstash-core/lib/logstash/compiler.rb:32:in `compile_imperative'", "org/logstash/execution/AbstractPipelineExt.java:184:in `initialize'", "org/logstash/execution/JavaBasePipelineExt.java:69:in `initialize'", "/usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:47:in `initialize'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline_action/create.rb:52:in `execute'", "/usr/share/logstash/logstash-core/lib/logstash/agent.rb:389:in `block in converge_state'"]}
[2021-09-28T23:49:22,924][ERROR][logstash.agent           ] Failed to execute action {:action=>LogStash::PipelineAction::Create/pipeline_id:main, :exception=>"LogStash::ConfigurationError", :message=>"Expected one of [ \\t\\r\\n], \"#\", \"=>\" at line 67, column 12 (byte 2163) after filter {\n\n        # if [message] =~ /<?xml/ { drop {} }\n\n\n\n\n\nmutate {\n                 gsub => [ \"message\", \"[<]measCollecFile[^>]*[>]\", \"\" ]\n                 gsub => [ \"message\", \"^[^_]*<measData(.)$\", \"\" ]\n                 gsub => [ \"message\", \"[<]managedElement[^>]*[>]\", \"\" ]\n                 gsub => [ \"message\", \"[^_]\\/measData[^_]*$\", \"\" ]\n                 }\n\n\nxml { source => \"message\" target => \"theXML\" store_xml => true remove_field => [ \"message\" ] }\n\n\n    ruby {\n        code => '\n            measData = event.get(\"[theXML][measData]\")\n            newMeasData = {}\n\n            # measData contains a managedElement object and an array of measInfo\n            newMeasData[\"managedElement\"] = measData[0][\"managedElement\"]\n            measData[0][\"measInfo\"].each { |x|\n                # Process a measInfo hash\n                newMeasInfo = {}\n                newMeasInfo[\"granPeriod\"] = x[\"granPeriod\"][0]\n                newMeasInfo[\"repPeriod\"] = x[\"repPeriod\"][0]\n                newMeasInfo[\"job\"] = x[\"job\"][0]\n\n                # Convert measType array into a hash we can refer to with the \"p\" value\n                measType = {}\n                x[\"measType\"].each { |y|\n                    measType[y[\"p\"]] = y[\"content\"]\n                }\n\n                # Now work through the measValue array and do the joins\n                newMeasValue = {}\n                x[\"measValue\"].each { |y|\n                    dataHash = {}\n                    y[\"r\"].each { |z|\n                        dataHash[measType[z[\"p\"]]] = z[\"content\"]\n                    }\n                    newMeasInfo[y[\"measObjLdn\"]] = dataHash\n                }\n                newMeasData[x[\"measInfoId\"]] = newMeasInfo\n            }\n            event.set(\"measData\", newMeasData)\n        '\n    }\n    mutate { remove_field => [ \"theXML\" ] }\n\noutput {\n    stdout ", :backtrace=>["/usr/share/logstash/logstash-core/lib/logstash/compiler.rb:32:in `compile_imperative'", "org/logstash/execution/AbstractPipelineExt.java:184:in `initialize'", "org/logstash/execution/JavaBasePipelineExt.java:69:in `initialize'", "/usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:47:in `initialize'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline_action/create.rb:52:in `execute'", "/usr/share/logstash/logstash-core/lib/logstash/agent.rb:389:in `block in converge_state'"]}

=================
I saw some errors and at least code didn't parse... nothing Can You past all of code that You have used to parse XML? I'm not so good in ruby therefore I'm asking about it....

You are missing a } to close the filter section. Why are you doing all those mutate+gsub operations?

I'm using regex for close tag. Without these mutate+gsub I got many of unclosed warning in the output, but sill something goes strange

[2021-09-29T00:15:08,667][WARN ][logstash.filters.xml     ][main][6a1dd3c0ea1a68400e6a643f954770d73f414b2370c1c190342430994307b78c] Error parsing xml with XmlSimple {:source=>"message", :value=>"<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<?xml-stylesheet type=\"text/xsl\" href=\"MeasDataCollection.xsl\"?>\n\n\t<fileHeader fileFormatVersion=\"32.435 V10.0\" vendorName=\"Ericsson AB\" dnPrefix=\"SubNetwork=ONRM_RootMo_R,SubNetwork=NODE,MeContext=NODE21\">\n\n\t\t\n\t\t<measInfo measInfoId=\"DiaNode\">\n\t\t\t<job jobId=\"NODE_System_SysDef_NOOSSCONTROL\"/>\n\t\t\t<granPeriod duration=\"PT300S\" endTime=\"2021-09-15T15:05:00+02:00\"/>\n\t\t\t<repPeriod duration=\"PT300S\"/>\n\t\t\t<measType p=\"1\">Diameter.EgressAnswMsg.Info</measType>\n\t\t\t<measType p=\"2\">Diameter.EgressAnswMsg.PermanentFailure</measType>\n\t\t\t<measType p=\"3\">Diameter.EgressAnswMsg.ProtocolError</measType>\n\t\t\t<measType p=\"4\">Diameter.EgressAnswMsg.Success</measType>\n\t\t\t<measType p=\"5\">Diameter.EgressAnswMsg.TotalCount</measType>\n\t\t\t<measType p=\"6\">Diameter.EgressAnswMsg.TransientFailure</measType>\n\t\t\t<measType p=\"7\">Diameter.EgressAnswMsgConnectionMgmt.TotalCount</measType>\n\t\t\t<measType p=\"8\">Diameter.EgressAnswMsgDiscarded.Congestion</measType>\n\t\t\t<measType p=\"9\">Diameter.EgressAnswMsgDiscarded.ConnectionLost</measType>\n\t\t\t<measType p=\"10\">Diameter.EgressAnswMsgDiscarded.TimeOut</measType>\n\t\t\t<measType p=\"11\">Diameter.EgressAnswMsgDiscarded.TotalCount</measType>\n\t\t\t<measType p=\"12\">Diameter.EgressAnswMsgPosted.TotalCount</measType>\n\t\t\t<measType p=\"13\">Diameter.EgressReqMsg.TotalCount</measType>\n\t\t\t<measType p=\"14\">Diameter.EgressReqMsgConnectionMgmt.TotalCount</measType>\n\t\t\t<measType p=\"15\">Diameter.EgressReqMsgDiscarded.Congestion</measType>\n\t\t\t<measType p=\"16\">Diameter.EgressReqMsgDiscarded.ConnectionLost</measType>\n\t\t\t<measType p=\"17\">Diameter.EgressReqMsgDiscarded.Routing</measType>\n\t\t\t<measType p=\"18\">Diameter.EgressReqMsgDiscarded.TimeOut</measType>\n\t\t\t<measType p=\"19\">Diameter.EgressReqMsgDiscarded.TotalCount</measType>\n\t\t\t<measType p=\"20\">Diameter.EgressReqMsgPosted.TotalCount</measType>\n\t\t\t<measType p=\"21\">Diameter.EgressReqMsgResent.ConnectionLost</measType>\n\t\t\t<measType p=\"22\">Diameter.EgressReqMsgResent.ProtocolError</measType>\n\t\t\t<measType p=\"23\">Diameter.EgressReqMsgResent.TimeOut</measType>\n\t\t\t<measType p=\"24\">Diameter.EgressReqMsgResent.TotalCount</measType>\n\t\t\t<measType p=\"25\">Diameter.EgressReqMsgResent.TransientFailure</measType>\n\t\t\t<measType p=\"26\">Diameter.IngressAnswMsg.Info</measType>\n\t\t\t<measType p=\"27\">Diameter.IngressAnswMsg.Malformation</measType>\n\t\t\t<measType p=\"28\">Diameter.IngressAnswMsg.PermanentFailure</measType>\n\t\t\t<measType p=\"29\">Diameter.IngressAnswMsg.ProtocolError</measType>\n\t\t\t<measType p=\"30\">Diameter.IngressAnswMsg.Success</measType>\n\t\t\t<measType p=\"31\">Diameter.IngressAnswMsg.TotalCount</measType>\n\t\t\t<measType p=\"32\">Diameter.IngressAnswMsg.TransientFailure</measType>\n\t\t\t<measType p=\"33\">Diameter.IngressAnswMsgConnectionMgmt.TotalCount</measType>\n\t\t\t<measType p=\"34\">Diameter.IngressAnswMsgDelivered.TotalCount</measType>\n\t\t\t<measType p=\"35\">Diameter.IngressAnswMsgDiscarded.Congestion</measType>\n\t\t\t<measType p=\"36\">Diameter.IngressAnswMsgDiscarded.ConnectionLost</measType>\n\t\t\t<measType p=\"37\">Diameter.IngressAnswMsgDiscarded.Malformation</measType>\n\t\t\t<measType p=\"38\">Diameter.IngressAnswMsgDiscarded.TotalCount</measType>\n\t\t\t<measType p=\"39\">Diameter.IngressReqMsg.TotalCount</measType>\n\t\t\t<measType p=\"40\">Diameter.IngressReqMsgConnectionMgmt.TotalCount</measType>\n\t\t\t<measType p=\"41\">Diameter.IngressReqMsgDelivered.TotalCount</measType>\n\t\t\t<measType p=\"42\">Diameter.IngressReqMsgDiscarded.Congestion</measType>\n\t\t\t<measType p=\"43\">Diameter.IngressReqMsgDiscarded.ConnectionLost</measType>\n\t\t\t<measType p=\"44\">Diameter.IngressReqMsgDiscarded.LoadReg</measType>\n\t\t\t<measType p=\"45\">Diameter.IngressReqMsgDiscarded.Malformation</measType>\n\t\t\t<measType p=\"46\">Diameter.IngressReqMsgDiscarded.Routing</measType>\n\t\t\t<measType p=\"47\">Diameter.IngressReqMsgDiscarded.TotalCount</measType>\n\t\t\t<measType p=\"48\">Diameter.IngressReqMsgResent.TotalCount</measType>\n\t\t\t<measType p=\"49\">Diameter.RxBytes.Total</measType>\n\t\t\t<measType p=\"50\">Diameter.TxBytes.Total</measType>\n\t\t\t<measValue measObjLdn=\"DiaNode=NODE21fe.epc.mnc002.mcc260.3gppnetwork.org\">\n\t\t\t\t<r p=\"1\">0</r>\n\t\t\t\t<r p=\"2\">12</r>\n\t\t\t\t<r p=\"3\">0</r>\n\t\t\t\t<r p=\"4\">999931</r>\n\t\t\t\t<r p=\"5\">999943</r>\n\t\t\t\t<r p=\"6\">0</r>\n\t\t\t\t<r p=\"7\">148</r>\n\t\t\t\t<r p=\"8\">0</r>\n\t\t\t\t<r p=\"9\">0</r>\n\t\t\t\t<r p=\"10\">0</r>\n\t\t\t\t<r p=\"11\">0</r>\n\t\t\t\t<r p=\"12\">999735</r>\n\t\t\t\t<r p=\"13\">235698</r>\n\t\t\t\t<r p=\"14\">10</r>\n\t\t\t\t<r p=\"15\">0</r>\n\t\t\t\t<r p=\"16\">0</r>\n\t\t\t\t<r p=\"17\">0</r>\n\t\t\t\t<r p=\"18\">0</r>\n\t\t\t\t<r p=\"19\">0</r>\n\t\t\t\t<r p=\"20\">235688</r>\n\t\t\t\t<r p=\"21\">0</r>\n\t\t\t\t<r p=\"22\">0</r>\n\t\t\t\t<r p=\"23\">0</r>\n\t\t\t\t<r p=\"24\">0</r>\n\t\t\t\t<r p=\"25\">0</r>\n\t\t\t\t<r p=\"26\">0</r>\n\t\t\t\t<r p=\"27\">0</r>\n\t\t\t\t<r p=\"28\">286</r>\n\t\t\t\t<r p=\"29\">2</r>\n\t\t\t\t<r p=\"30\">232319</r>\n\t\t\t\t<r p=\"31\">235694</r>\n\t\t\t\t<r p=\"32\">0</r>\n\t\t\t\t<r p=\"33\">10</r>\n\t\t\t\t<r p=\"34\">235685</r>\n\t\t\t\t<r p=\"35\">0</r>\n\t\t\t\t<r p=\"36\">0</r>\n\t\t\t\t<r p=\"37\">0</r>\n\t\t\t\t<r p=\"38\">0</r>\n\t\t\t\t<r p=\"39\">999957</r>\n\t\t\t\t<r p=\"40\">148</r>\n\t\t\t\t<r p=\"41\">999744</r>\n\t\t\t\t<r p=\"42\">0</r>\n\t\t\t\t<r p=\"43\">0</r>\n\t\t\t\t<r p=\"44\">0</r>\n\t\t\t\t<r p=\"45\">0</r>\n\t\t\t\t<r p=\"46\">0</r>\n\t\t\t\t<r p=\"47\">0</r>\n\t\t\t\t<r p=\"48\">1</r>\n\t\t\t\t<r p=\"49\">532195660</r>\n\t\t\t\t<r p=\"50\">1001410364</r>\n\t\t\t</measValue>\n\t\t</measInfo>", :exception=>#<REXML::ParseException: No close tag for /fileHeader
Line: 113
Position: 5155
Last 80 unconsumed characters:
>, :backtrace=>["uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/rexml/parsers/treeparser.rb:28:in `parse'", "uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/rexml/document.rb:288:in `build'", "uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/rexml/document.rb:45:in `initialize'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/xml-simple-1.1.8/lib/xmlsimple.rb:979:in `parse'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/xml-simple-1.1.8/lib/xmlsimple.rb:164:in `xml_in'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/xml-simple-1.1.8/lib/xmlsimple.rb:203:in `xml_in'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-filter-xml-4.1.1/lib/logstash/filters/xml.rb:195:in `filter'", "/usr/share/logstash/logstash-core/lib/logstash/filters/base.rb:159:in `do_filter'", "/usr/share/logstash/logstash-core/lib/logstash/filters/base.rb:178:in `block in multi_filter'", "org/jruby/RubyArray.java:1809:in `each'", "/usr/share/logstash/logstash-core/lib/logstash/filters/base.rb:175:in `multi_filter'", "org/logstash/config/ir/compiler/AbstractFilterDelegatorExt.java:134:in `multi_filter'", "/usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:295:in `block in start_workers'"]}
[2021-09-29T00:15:08,724][ERROR][logstash.filters.ruby    ][main][751a41a06aab89d24f4be6b93109dbbf7cc10164db03ff696a3a7f56a3bce8b1] Ruby exception occurred: undefined method `[]' for nil:NilClass
{
      "@version" => "1",
          "tags" => [
        [0] "multiline",
        [1] "_xmlparsefailure",
        [2] "_rubyexception"
    ],
          "type" => "xml",
          "path" => "/data/A20210915.1500+0200-1505+0200_node21.xml",
          "host" => "05ae074f5dd0",
    "@timestamp" => 2021-09-29T00:15:07.650Z,
       "message" => "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<?xml-stylesheet type=\"text/xsl\" href=\"MeasDataCollection.xsl\"?>\n\n\t<fileHeader fileFormatVersion=\"32.435 V10.0\" vendorName=\"Ericsson AB\" dnPrefix=\"SubNetwork=ONRM_RootMo_R,SubNetwork=NODE,MeContext=NODE21\">\n\n\t\t\n\t\t<measInfo measInfoId=\"DiaNode\">\n\t\t\t<job jobId=\"NODE_System_SysDef_NOOSSCONTROL\"/>\n\t\t\t<granPeriod duration=\"PT300S\" endTime=\"2021-09-15T15:05:00+02:00\"/>\n\t\t\t<repPeriod duration=\"PT300S\"/>\n\t\t\t<measType p=\"1\">Diameter.EgressAnswMsg.Info</measType>\n\t\t\t<measType p=\"2\">Diameter.EgressAnswMsg.PermanentFailure</measType>\n\t\t\t<measType p=\"3\">Diameter.EgressAnswMsg.ProtocolError</measType>\n\t\t\t<measType p=\"4\">Diameter.EgressAnswMsg.Success</measType>\n\t\t\t<measType p=\"5\">Diameter.EgressAnswMsg.TotalCount</measType>\n\t\t\t<measType p=\"6\">Diameter.EgressAnswMsg.TransientFailure</measType>\n\t\t\t<measType p=\"7\">Diameter.EgressAnswMsgConnectionMgmt.TotalCount</measType>\n\t\t\t<measType p=\"8\">Diameter.EgressAnswMsgDiscarded.Congestion</measType>\n\t\t\t<measType p=\"9\">Diameter.EgressAnswMsgDiscarded.ConnectionLost</measType>\n\t\t\t<measType p=\"10\">Diameter.EgressAnswMsgDiscarded.TimeOut</measType>\n\t\t\t<measType p=\"11\">Diameter.EgressAnswMsgDiscarded.TotalCount</measType>\n\t\t\t<measType p=\"12\">Diameter.EgressAnswMsgPosted.TotalCount</measType>\n\t\t\t<measType p=\"13\">Diameter.EgressReqMsg.TotalCount</measType>\n\t\t\t<measType p=\"14\">Diameter.EgressReqMsgConnectionMgmt.TotalCount</measType>\n\t\t\t<measType p=\"15\">Diameter.EgressReqMsgDiscarded.Congestion</measType>\n\t\t\t<measType p=\"16\">Diameter.EgressReqMsgDiscarded.ConnectionLost</measType>\n\t\t\t<measType p=\"17\">Diameter.EgressReqMsgDiscarded.Routing</measType>\n\t\t\t<measType p=\"18\">Diameter.EgressReqMsgDiscarded.TimeOut</measType>\n\t\t\t<measType p=\"19\">Diameter.EgressReqMsgDiscarded.TotalCount</measType>\n\t\t\t<measType p=\"20\">Diameter.EgressReqMsgPosted.TotalCount</measType>\n\t\t\t<measType p=\"21\">Diameter.EgressReqMsgResent.ConnectionLost</measType>\n\t\t\t<measType p=\"22\">Diameter.EgressReqMsgResent.ProtocolError</measType>\n\t\t\t<measType p=\"23\">Diameter.EgressReqMsgResent.TimeOut</measType>\n\t\t\t<measType p=\"24\">Diameter.EgressReqMsgResent.TotalCount</measType>\n\t\t\t<measType p=\"25\">Diameter.EgressReqMsgResent.TransientFailure</measType>\n\t\t\t<measType p=\"26\">Diameter.IngressAnswMsg.Info</measType>\n\t\t\t<measType 

and one thing more, in the second approach, it will be the better if we have put
name of content apart number, is't possible ??? this elements are dynamically change

<measInfo measInfoId="DiaNode">
			<job jobId="NODE_System_SysDef_NOOSSCONTROL"/>
			<granPeriod duration="PT300S" endTime="2021-09-15T15:05:00+02:00"/>
			<repPeriod duration="PT300S"/>
			<measType p="1">Diameter.EgressAnswMsg.Info</measType>
			<measType p="2">Diameter.EgressAnswMsg.PermanentFailure</measType>
			<measType p="3">Diameter.EgressAnswMsg.ProtocolError</measType>
			<measType p="4">Diameter.EgressAnswMsg.Success</measType>
			<measType p="5">Diameter.EgressAnswMsg.TotalCount</measType>
			<measType p="6">Diameter.EgressAnswMsg.TransientFailure</measType>
			<measType p="7">Diameter.EgressAnswMsgConnectionMgmt.TotalCount</measType>
<measValue measObjLdn="DiaNode=NODE21fe.epc.mnc002.mcc260.3gppnetwork.org">
				<r p="1">0</r>
				<r p="2">12</r>
				<r p="3">0</r>
				<r p="4">999931</r>
				<r p="5">999943</r>
				<r p="6">0</r>
				<r p="7">148</r>

That suggests that either your multiline codec is not working well or you are mutating the XML into an invalid format.

Remember that I said "Note that the code contains no error checking". One example of that is that it does

        measData = event.get("[theXML][measData]")
        ...
        measData[0]["measInfo"].each { |x|

If the xml filter fails then [theXML] will not exist, let alone an array called [theXML][measData][0]["measInfo"], so that will get the exact

Ruby exception occurred: undefined method `[]' for nil:NilClass

exception that you see in your logs. The two ruby filters I posted were examples of how you could process the data. It's not production ready code.

As to the format and whether the join is useful, that's really up to you. I cannot do your data design for you.

No, the first split filter splits the event into 11 events, one for each measInfo element. The second splits those into several, or even hundreds, of separate events, one for each measData element.

ok, if I've changed from force_array => false to force_array => true it looks better but "measInfo" was missed out

this is the /data/test_data.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="MeasDataCollection.xsl"?>
<measCollecFile xmlns="http://www.3gpp.org/ftp/specs/archive/32_series/32.435#measCollec"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.3gpp.org/ftp/specs/archive/32_series/32.435#measCollec
http://www.3gpp.org/ftp/specs/archive/32_series/32.435#measCollec">

 <fileHeader fileFormatVersion="32.435 V7.0" vendorName="Company NN" dnPrefix="DC=a1.companyNN.com,SubNetwork=1,IRPAgent=1">
 <fileSender localDn="SubNetwork=CountryNN,MeContext=MEC-Gbg-1,ManagedElement=RNC-Gbg-1" elementType="RNC"/>
 <measCollec beginTime="2000-03-01T14:00:00+02:00"/>
 </fileHeader>
 <measData>
 <managedElement localDn="SubNetwork=CountryNN,MeContext=MEC-Gbg-1,ManagedElement=RNC-Gbg-1" userLabel="RNC Telecomville"/>
 <measInfo measInfoId="Node1">
 <job jobId="1231"/>
 <granPeriod duration="PT900S" endTime="2000-03-01T14:14:30+02:00"/>
 <repPeriod duration="PT1800S"/>
 <measType p="1">attTCHSeizures</measType>
 <measType p="2">succTCHSeizures</measType>
 <measType p="3">attImmediateAssignProcs</measType>
 <measType p="4">succImmediateAssignProcs</measType>
 <measValue measObjLdn="RncFunction=RF-1,UtranCell=Gbg-997">
 <r p="1">234</r>
 <r p="2">345</r>
 <r p="3">567</r>
 <r p="4">789</r>
 </measValue>
 <measValue measObjLdn="RncFunction=RF-1,UtranCell=Gbg-998">
 <r p="1">890</r>
 <r p="2">901</r>
 <r p="3">123</r>
 <r p="4">234</r>
 </measValue>
 <measValue measObjLdn="RncFunction=RF-1,UtranCell=Gbg-999">
 <r p="1">456</r>
 <r p="2">567</r>
 <r p="3">678</r>
 <r p="4">789</r>
 <suspect>true</suspect>
 </measValue>
 </measInfo>
 <measInfo measInfoId="Node2">
 <job jobId="1232"/>
 <granPeriod duration="PT1000s" endTime="2000-03-01T14:14:30+02:00"/>
 <repPeriod duration="PT1000S"/>
 <measType p="1">attTCHSeizures2</measType>
 <measType p="2">succTCHSeizures2</measType>
 <measType p="3">attImmediateAssignProcs2</measType>
 <measType p="4">succImmediateAssignProcs2</measType>
 <measValue measObjLdn="RncFunction=RF-1,UtranCell=Gbg-1000">
 <r p="1">234</r>
 <r p="2">345</r>
 <r p="3">567</r>
 <r p="4">789</r>
 </measValue>
 <measValue measObjLdn="RncFunction=RF-1,UtranCell=Gbg-1001">
 <r p="1">890</r>
 <r p="2">901</r>
 <r p="3">123</r>
 <r p="4">234</r>
 </measValue>
 <measValue measObjLdn="RncFunction=RF-1,UtranCell=Gbg-1002">
 <r p="1">456</r>
 <r p="2">567</r>
 <r p="3">678</r>
 <r p="4">789</r>
 <suspect>true</suspect>
 </measValue>
 </measInfo>
 </measData>
 <fileFooter>
 <measCollec endTime="2000-03-01T14:15:00+02:00"/>
 </fileFooter>
</measCollecFile>

results:

[2021-10-14T16:29:50,850][WARN ][logstash.filters.split   ][main][981bee6035d2dd5cbc036bf6fa70c000af1303ce9f77f813c8d27bda64267a7d] Only String and Array types are splittable. field:measInfo is of type = NilClass
{
    "@timestamp" => 2021-10-14T16:29:49.294Z,
          "type" => "xml",
          "path" => "/data/test_data.xml",
      "measData" => [
        [0] {
                                 "job" => {
                "jobId" => "1231"
            },
            "succImmediateAssignProcs" => "789",
                          "granPeriod" => {
                 "endTime" => "2000-03-01T14:14:30+02:00",
                "duration" => "PT900S"
            },
                      "managedElement" => {
                  "localDn" => "SubNetwork=CountryNN,MeContext=MEC-Gbg-1,ManagedElement=RNC-Gbg-1",
                "userLabel" => "RNC Telecomville"
            },
             "attImmediateAssignProcs" => "567",
                     "succTCHSeizures" => "345",
                      "attTCHSeizures" => "234",
                          "measInfoId" => "RncFunction=RF-1,UtranCell=Gbg-997",
                           "repPeriod" => {
                "duration" => "PT1800S"
            }
        },
        [1] {
                                 "job" => {
                "jobId" => "1231"
            },
            "succImmediateAssignProcs" => "234",
                          "granPeriod" => {
                 "endTime" => "2000-03-01T14:14:30+02:00",
                "duration" => "PT900S"
            },
                      "managedElement" => {
                  "localDn" => "SubNetwork=CountryNN,MeContext=MEC-Gbg-1,ManagedElement=RNC-Gbg-1",
                "userLabel" => "RNC Telecomville"
            },
             "attImmediateAssignProcs" => "123",
                     "succTCHSeizures" => "901",
                      "attTCHSeizures" => "890",
                          "measInfoId" => "RncFunction=RF-1,UtranCell=Gbg-998",
                           "repPeriod" => {
                "duration" => "PT1800S"
            }
        },
        [2] {
                                 "job" => {
                "jobId" => "1231"
            },
            "succImmediateAssignProcs" => "789",
                          "granPeriod" => {
                 "endTime" => "2000-03-01T14:14:30+02:00",
                "duration" => "PT900S"
            },
                      "managedElement" => {
                  "localDn" => "SubNetwork=CountryNN,MeContext=MEC-Gbg-1,ManagedElement=RNC-Gbg-1",
                "userLabel" => "RNC Telecomville"
            },
             "attImmediateAssignProcs" => "678",
                     "succTCHSeizures" => "567",
                      "attTCHSeizures" => "456",
                          "measInfoId" => "RncFunction=RF-1,UtranCell=Gbg-999",
                           "repPeriod" => {
                "duration" => "PT1800S"
            }
        }
    ],
          "host" => "a46a02ab2386",
      "@version" => "1",
          "tags" => [
        [0] "multiline",
        [1] "_split_type_failure"
    ]
}
{
    "@timestamp" => 2021-10-14T16:29:49.294Z,
          "type" => "xml",
          "path" => "/data/test_data.xml",
      "measData" => [
        [0] {
                                  "job" => {
                "jobId" => "1232"
            },
                      "attTCHSeizures2" => "234",
                           "granPeriod" => {
                 "endTime" => "2000-03-01T14:14:30+02:00",
                "duration" => "PT1000s"
            },
                       "managedElement" => {
                  "localDn" => "SubNetwork=CountryNN,MeContext=MEC-Gbg-1,ManagedElement=RNC-Gbg-1",
                "userLabel" => "RNC Telecomville"
            },
             "attImmediateAssignProcs2" => "567",
            "succImmediateAssignProcs2" => "789",
                           "measInfoId" => "RncFunction=RF-1,UtranCell=Gbg-1000",
                            "repPeriod" => {
                "duration" => "PT1000S"
            },
                     "succTCHSeizures2" => "345"
        },
        [1] {
                                  "job" => {
                "jobId" => "1232"
            },
                      "attTCHSeizures2" => "890",
                           "granPeriod" => {
                 "endTime" => "2000-03-01T14:14:30+02:00",
                "duration" => "PT1000s"
            },
                       "managedElement" => {
                  "localDn" => "SubNetwork=CountryNN,MeContext=MEC-Gbg-1,ManagedElement=RNC-Gbg-1",
                "userLabel" => "RNC Telecomville"
            },
             "attImmediateAssignProcs2" => "123",
            "succImmediateAssignProcs2" => "234",
                           "measInfoId" => "RncFunction=RF-1,UtranCell=Gbg-1001",
                            "repPeriod" => {
                "duration" => "PT1000S"
            },
                     "succTCHSeizures2" => "901"
        },
        [2] {
                                  "job" => {
                "jobId" => "1232"
            },
                      "attTCHSeizures2" => "456",
                           "granPeriod" => {
                 "endTime" => "2000-03-01T14:14:30+02:00",
                "duration" => "PT1000s"
            },
                       "managedElement" => {
                  "localDn" => "SubNetwork=CountryNN,MeContext=MEC-Gbg-1,ManagedElement=RNC-Gbg-1",
                "userLabel" => "RNC Telecomville"
            },
             "attImmediateAssignProcs2" => "678",
            "succImmediateAssignProcs2" => "789",
                           "measInfoId" => "RncFunction=RF-1,UtranCell=Gbg-1002",
                            "repPeriod" => {
                "duration" => "PT1000S"
            },
                     "succTCHSeizures2" => "567"
        }
    ],
          "host" => "a46a02ab2386",
      "@version" => "1",
          "tags" => [
        [0] "multiline",
        [1] "_split_type_failure"
    ]
}

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.