Logstash XML Filter Issue

Hi,

I'm new to Logstash and i'm trying to parse the following sample XML file. I've tried using XPATH and Split filter plugins and still can't get it working. Can someone please advice or guide me please? Thanks a lot.

Sample XML file:

<?xml version="1.0" encoding="UTF-8"?>
<host="company1.com" port="443">
 <description="This is company A" />
  <sslversion="TLSv1.2" bits="128" cipher="ECDHE-RSA-AES128-GCM-SHA256" />
  <sslversion="TLSv1.1" bits="256" cipher="AES256-SHA" />
  <sslversion="TLSv1.0" bits="128" cipher="ECDHE-RSA-AES128-SHA" />
 <certificate>
  <not-valid-before>Mar 29 00:00:00 2018 GMT</not-valid-before>
  <not-valid-after>Dec 29 12:00:00 2019 GMT</not-valid-after>
 </certificate>
</host>
<host="company2.com" port="4443">
 <description="This is company B" />
  <sslversion="TLSv1.2" bits="128" cipher="ECDHE-RSA-AES128-GCM-SHA256" />
  <sslversion="TLSv1.1" bits="128" cipher="ECDHE-RSA-AES128-SHA" />
 <certificate>
  <not-valid-before>Mar 29 00:00:00 2017 GMT</not-valid-before>
  <not-valid-after>Dec 29 12:00:00 2018 GMT</not-valid-after>
 </certificate>
</host>

The intended output is:

{
    "@timestamp" => 2017-08-29T13:45:46.112Z,
      "@version" => "1",
          "host" => "company1.com",
          "port" => "443",
   "description" => "This is company A",
     "CipherInfo" => [
        [0] {
            "SSLVersion" => "TLSv1.2",
            "bits" => "128",
            "cipher" => "ECDHE-RSA-AES128-GCM-SHA256"
        },
        [1] {
            "SSLVersion" => "TLSv1.1",
            "bits" => "256",
            "cipher" => "AES256-SHA"
        },
        [2] {
        "SSLVersion" => "TLSv1.0",
        "bits" => "128",
        "cipher" => "ECDHE-RSA-AES128-SHA"
        }
    ],
        "CertValidity" => [
        [0] {
            "Not-Valid-Before" => "Mar 29 00:00:00 2018 GMT",
            "Not-Valid-After"  => "Dec 29 12:00:00 2019 GMT"
        }
        ]
}
{
    "@timestamp" => 2017-08-29T13:45:46.112Z,
      "@version" => "2",
          "host" => "company2.com",
          "port" => "4443",
   "description" => "This is company B",
     "CipherInfo" => [
        [0] {
            "SSLVersion" => "TLSv1.2",
            "bits" => "128",
            "cipher" => "ECDHE-RSA-AES128-GCM-SHA256"
        },
        [1] {
            "SSLVersion" => "TLSv1.1",
            "bits" => "128",
            "cipher" => "ECDHE-RSA-AES128-SHA"
        }
    ],
        "CertValidity" => [
        [0] {
            "Not-Valid-Before" => "Mar 29 00:00:00 2017 GMT",
            "Not-Valid-After"  => "Dec 29 12:00:00 2018 GMT"
        }
        ]
}

I do not believe that is valid XML. You can use a multiline codec on a file filter to consume events

codec => multiline { pattern => '^</host>' negate => true what => "next" auto_flush_interval => 1 }

That will get you events like

   "message" => "<host=\"company2.com\" port=\"4443\">\n <description=\"This is company B\" />\n  <sslversion=\"TLSv1.2\" bits=\"128\" cipher=\"ECDHE-RSA-AES128-GCM-SHA256\" />\n  <sslversion=\"TLSv1.1\" bits=\"128\" cipher=\"ECDHE-RSA-AES128-SHA\" />\n <certificate>\n  <not-valid-before>Mar 29 00:00:00 2017 GMT</not-valid-before>\n  <not-valid-after>Dec 29 12:00:00 2018 GMT</not-valid-after>\n </certificate>\n</host>"

which you can parse using grok.

Hi Badger,

Thanks for the update. I've got an updated XML file provided to me. Following is the updated XML file:

<?xml version="1.0" encoding="UTF-8"?>
<ssltest host="company1.com" port="443">
  <description="This is company A" />
  <sslversion="TLSv1.2" bits="128" cipher="ECDHE-RSA-AES128-GCM-SHA256" />
  <sslversion="TLSv1.1" bits="256" cipher="AES256-SHA" />
  <sslversion="TLSv1.0" bits="128" cipher="ECDHE-RSA-AES128-SHA" />
  <certificate>
   <not-valid-before>Mar 29 00:00:00 2018 GMT</not-valid-before>
   <not-valid-after>Dec 29 12:00:00 2019 GMT</not-valid-after>
  </certificate>
 </ssltest>
 <ssltest host="company2.com" port="4443">
   <description="This is company B" />
   <sslversion="TLSv1.2" bits="128" cipher="ECDHE-RSA-AES128-GCM-SHA256" />
   <sslversion="TLSv1.1" bits="128" cipher="ECDHE-RSA-AES128-SHA" />
   <certificate>
    <not-valid-before>Mar 29 00:00:00 2017 GMT</not-valid-before>
    <not-valid-after>Dec 29 12:00:00 2018 GMT</not-valid-after>
   </certificate>
 </ssltest>

I've used the config you provided. I tried using grok and couldn't get it working. So far this is what I have done:

 file {
     path => "/home/ec2-user/logstash/sslscan/sample.xml"
     start_position => "beginning"
     sincedb_path => "/dev/null"
     codec => multiline {
       pattern => "^</ssltest>"
       negate => "true"
       what => "next"
       auto_flush_interval => 1
     }
 }
}

filter {
  xml {
    source => "message"
    target => "parsed"
    store_xml => "false"
    force_array => "false"
    xpath => ["/ssltest/@host", "host"]
    xpath => ["/ssltest/@port", "port"]
    xpath => ["/ssltest/description/text()", "description"]
  }
  xml {
    source => "message"
    target => "bleh"
    store_xml => "false"
    xpath => ["/ssltest/certificate", "CertValidity"]
    xpath => ["/ssltest/sslversion", "CipherInfo"]

  }

  mutate {
    remove_field => [ "host","path","message"]
  }

}

output {
stdout { codec => rubydebug }

This is the output:

{
            "tags" => [
        [0] "multiline"
    ],
    "CertValidity" => [
        [0] "<certificate>\n   <not-valid-before>Mar 29 00:00:00 2018 GMT</not-valid-before>\n   <not-valid-after>Dec 29 12:00:00 2019 GMT</not-valid-after>\n  </certificate>"
    ],
      "@timestamp" => 2019-10-02T04:31:00.540Z,
            "port" => "443",
        "@version" => "1",
      "CipherInfo" => [
        [0] "<sslversion bits=\"128\" cipher=\"ECDHE-RSA-AES128-GCM-SHA256\" port=\"TLSv1.2\"/>",
        [1] "<sslversion bits=\"256\" cipher=\"AES256-SHA\"/>",
        [2] "<sslversion bits=\"128\" cipher=\"ECDHE-RSA-AES128-SHA\"/>"
    ]
}

What I'm trying to figure out at the moment is,

  1. How do I get the arrays for 'CipherInfo' and 'CertValidity'.
  2. Not sure why the hostname and description not picked and only the first certificate details are picked.

Thanks!

You do not get [host] because you use mutate to remove it.

Incredible as it may seem, to get [description] you have to use

xpath => ["/ssltest/description/@port", "description"]

I am uncertain whether your XML is invalid or that is just a bug in the parser.

Only the first certificate is picked up because there is only one certificate in each ssltest element, and that is what the multiline match picks up.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.