Handle nested xml file with logstash and ruby script

Hello,
I have an XML structure

<?xml version="1.0" ?>
<XMLdata>
    <Policy>
        <policyName>Arena Standard</policyName>
            <Preferences>
                <ServerPreferences>max_simult_tcp_sessions></ServerPreferences>
            </Preferences>
    </Policy>
    <Report name="Scan">
        <ReportHost name="huit.com">
            <HostProperties>
                <tag name="LastUnauthenticatedResults">111111</tag>
                <tag name="Credentialed_Scan">false</tag>
            </HostProperties>
            <ReportItem port="0" svc_name="general">
                <description>55555</description>
                <risk>none</risk>
            </ReportItem>
        </ReportHost>
        <ReportHost name="1.2.3.4">
            <HostProperties>
                <tag name="LastUnauthenticatedResults">22222</tag>
                <tag name="Credentialed_Scan">true</tag>
            </HostProperties>
            <ReportItem port="15672" svc_name="general">
                <description>9999</description>
                <risk>none</risk>
            </ReportItem>
        </ReportHost>
    </Report>
</XMLdata>

I’d like Logstash to output this structure

{
 "name": "huits.com",
 "LastUnauthenticatedResults": "111111",
 "Credentialed_Scan": "false",
 "Port": "0",
 "svc_name": "general",
 "description": "55555",
 "risk": "none"
}

I tried it with xml and ruby filters like the guy in Stackoverflow posted.
This is my cfg:

input {
  file {
    path => "/home/vagrant/data/test.xml"
    start_position => "beginning"
    sincedb_path => "/dev/null"
    codec => multiline {
      pattern => "<XMLdata>"
      negate => "true"
      what => "previous"
      auto_flush_interval => 1
      max_lines => 333333
    }
  }
}

filter {
  xml {
    store_xml => "false"
    source => "message"
    target => "parsed"
  }
  ruby {
    code => '
     # event.set("Preference", event.get("[parsed][ServerPreferences][0][ServerPreverences]"))
     # event.set("HostProperties", event.get("[parsed][ReportHost][HostProperties][0][tag]"))
       event.set("HostName", event.get("[parsed][Report][0][name]"))
       event.set("Port", event.get("[parsed][Report][ReportHost][ReportItem][0][port]"))
    '
  }
  mutate {
  remove_field => ["parsed","@version","message"]
  }
}

output {
  stdout { }
}

This is my output

{
    "@timestamp" => 2018-10-23T12:57:04.661Z,
          "tags" => [
        [0] "multiline"
    ],
      "HostName" => nil,
          "path" => "/home/vagrant/test.xml",
          "Port" => nil,
          "host" => "localhost"
}

Why I just get nil when i expect my values. Am I using ruby wrong? I hope for some help.

Get rid of the Ruby code and use the xpath function of the XML filter. Below is between 90-100% of it. I'm not quite sure on the syntax of selecting attributes (IE ReportHost name="huit.com") so you may need to tweek that.

filter {
  xml {
    store_xml => "false"
    source => "message"
    xpath => [
      "/XMLData/Report/ReportHost/@name", "Name",
      "/XMLData/Report/ReportHost/HostProperties/tag name="LastUnauthenticatedResults"/text()", "LastUnauthenticatedResults",
      "/XMLData/Report/ReportHost/HostProperties/tag name="Credentialed_Scan"/text()", "Credentialed_Scan",
      "/XMLData/Report/ReportHost/ReportItem/@port", "Port",
      "/XMLData/Report/ReportHost/ReportItem/@svc_name", "Svc_name",
      "/XMLData/Report/ReportHost/ReportItem/description/text()", "Description",
      "/XMLData/Report/ReportHost/ReportItem/risk/text()", "Risk"
    ]
  }
}
1 Like

I had 70% of the XML filter, but i didn't know how to get the attributes of the element . Thank you.

Hello humalog,

You can extend the below code to get the output required,

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.