Logstash XML parsing problem


(lennart) #1
Hello! I've been trying to figure out how to do XML parsing in logstash for almost nine hours in total. There's not that much information about it and several sources are really old and I'm terrible at coming up with the correct keywords for finding relevant information. I've made this very simple configuration to show exactly what my problem is:

input {
   udp {
      port => 2055
   }
}

#xml looks something like this
#<7><LOG><TIMESTAMP>mytimestamp</TIMESTAMP><EVENT>myevent</EVENT><INFO>myinfo</INFO></LOG>
#how do I parse the xml so that timestamp, event and info have their own fields? The following didn't work
filter {
   xml {
      source => "message"
      force_array => false
      store_xml => false
      xpath => ["/LOG/TIMESTAMP", "timestamp",
            "/LOG/EVENT", "event",
            "/LOG/INFO", "info"]
     }
   mutate {
      add_field => {
       "timestamp" => "%{timestamp}"
       "event" => "%{event}"
       "info" => "%{info}"
      }
   }
}

output {
   elasticsearch {
   hosts => "localhost:9200"
   manage_template => false
   index => "logstash-xml-%{+YYYY.MM.dd}"
   }
}

so basically make this:

message: <7><LOG><TIMESTAMP>mytimestamp</TIMESTAMP><EVENT>myevent</EVENT><INFO>myinfo</INFO></LOG>

into this:

timestamp: mytimestamp
event: myevent
info: myinfo

Thanks in advance!

#2

If you test this with a simple config like

input { stdin {} }
output { stdout { codec => rubydebug } }

filter {
  xml {
    source => "message"
    target => "theXML"
  }
}

You will see that you are getting a _xmlparsefailure tag, which I would imagine is because the <7> tag is never closed. I have no idea what you want to do with that, but one approach would be to strip it off using mutate { gsub => [ "message", "^[^>]+>", "" ] } in which case the XML parser is happy

{
      "@version" => "1",
          "host" => "[...]t",
       "theXML" => {
             "INFO" => [
            [0] "myinfo"
        ],
            "EVENT" => [
            [0] "myevent"
        ],
        "TIMESTAMP" => [
            [0] "mytimestamp"
        ]
    },
    "@timestamp" => 2017-12-20T17:07:34.419Z,
       "message" => "<LOG><TIMESTAMP>mytimestamp</TIMESTAMP><EVENT>myevent</EVENT><INFO>myinfo</INFO></LOG>"
}
Note that everything is an array, so you may need to adjust your xpath expressions accordingly.

(lennart) #3

EDIT, this worked perfectly, thanks for the help! So basically, the <7> was the main cause for all my frustration in trying to make this work. Thanks for your help Badger!

input {
   udp {
      port => 2055
   }
}

filter {
   mutate { gsub => [ "message", "^[^>]+>", "" ] } 
   xml {
      source => "message"
      force_array => false
      store_xml => false
      xpath => ["/LOG/TIMESTAMP/text()", "timestamp",
            "/LOG/EVENT/text()", "event",
            "/LOG/INFO/text()", "info"]
     }
}

output {
   elasticsearch {
   hosts => "localhost:9200"
   manage_template => false
   index => "logstash-xml-%{+YYYY.MM.dd}"
   }
}

(system) #4

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.