Parsing nested XML with logstash

Hi,

I'm parsing xml-data from endpoint system with logstash and i'm not sure how parse nested data from xml;

Here's the sample data & logstash config

 <eventItem sequence_num="206423128" uid="18960533">
  <timestamp>2019-02-04T08:18:55.430Z</timestamp>
  <eventType>processEvent</eventType>
  <details>
   <detail>
    <name>eventType</name>
    <value>end</value>
   </detail>
   <detail>
    <name>pid</name>
    <value>1188</value>
   </detail>
   <detail>
    <name>processPath</name>
    <value>C:\Windows\System32\svchost.exe</value>
   </detail>
   <detail>
    <name>process</name>
    <value>svchost.exe</value>
   </detail>
   <detail>
    <name>parentPid</name>
    <value>792</value>
   </detail>
   <detail>
    <name>parentProcessPath</name>
    <value>C:\Windows\System32\services.exe</value>
   </detail>
   <detail>
    <name>parentProcess</name>
    <value>services.exe</value>
   </detail>
   <detail>
    <name>username</name>
    <value>user1234</value>
   </detail>
   <detail>
    <name>startTime</name>
    <value>2019-02-04T08:18:55.430Z</value>
   </detail>
  </details>
 </eventItem>

And the logstash config at the moment:

input {
    file {
        path => "/path/to/data/testdataset.xml"
        start_position => "beginning"
    sincedb_path => "/dev/null"
    codec => multiline {
        pattern => "^\s<eventItem" 
        negate => "true"
        what => "previous"
    } 
    }
}

filter {
    xml {
    source => "message"
    store_xml => true
    target => "agentevent"
    #xpath => [ "/eventItems/eventType/text()", "eventType" ]
    }
}

output {
  elasticsearch {
    hosts => ["http://localhost:9200"]
    index => "xml_test"
  }
}

This produced result that i'm looking, except for the details; This is where the details from event are populated and the number of details varies by the type of event.

At the moment the events parse like following:

"agentevent": {
  "details": [
    {
      "detail": [
        {
          "value": [
            "end"
          ],
          "name": [
            "eventType"
          ]
        },
        {
          "value": [
            "1188"
          ],
          "name": [
            "pid"
          ]
        },
        {
          "value": [
            "C:\\Windows\\System32\\svchost.exe"
          ],
          "name": [
            "processPath"
          ]
        },
        {
          "value": [
            "svchost.exe"
          ],
          "name": [
            "process"
          ]
        },
        {
          "value": [
            "792"
          ],
          "name": [
            "parentPid"
          ]
        },
        {
          "value": [
            "C:\\Windows\\System32\\services.exe"
          ],
          "name": [
            "parentProcessPath"
          ]
        },
        {
          "value": [
            "services.exe"
          ],
          "name": [
            "parentProcess"
          ]
        },
        {
          "value": [
            "user1234"
          ],
          "name": [
            "username"
          ]
        },
        {
          "value": [
            "2019-02-04T08:18:55.430Z"
          ],
          "name": [
            "startTime"
          ]
        }
      ]
    }
  ],
  "sequence_num": "206423128",
  "uid": "18960533",
  "timestamp": [
    "2019-02-04T08:18:55.430Z"
  ],
  "eventType": [
    "processEvent"
  ]

What i would like to have is each detail populate field named by and have value of , for example:

 <detail>
  <name>pid</name>
  <value>1188</value>
 </detail>

to produce field: pid: 1188

but i'm not sure how to achieve this?

You would need to use a ruby filter. Something like this.

Okay,

So how do i point the field agentevent.details to json filter?

I tried

json {
    source => "agentevent.details"
}

But this gives an error.

In the thread you linked you had this field created but how can i access this with XML? If use xpath to create fields before mutate, i will have all the names and values in those fields;

mutate { add_field => { "%{[Request][Headers][Name]}" => "%{[Request][Headers][Value]}" } }

That should be [agentevent][details]

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.