I am trying to parse through an XML file and read only a single tag out of the entire XML using the Grok Patterns.
My grok pattern looks like this. Its able to parse through an XML when its properly indented, as there's a new line after each closing tag. But when the file comes with no spaces in between consecutive tags, this pattern does not work. Could anyone help here?
filter { #ignore log comments
if [message] =~ "^#" {
drop {}
}
grok {
patterns_dir => "./patterns"
match => ["message", "%{DATA:extras}<LoadID%{DATA:extra}>%{DATA:ASNNumber}%{GREEDYDATA:behind}"]
}
date {
match => [ "timestamp", "yyyy-MM-dd HH:mm:ss" ]
locale => "en"
}
}
Second filter
filter {
if "_grokparsefailure" in [tags] {
drop { }
} else {
# on success remove the message field to save space
mutate {
remove_field => ["message", "timestamp", "extra", "extras", "behind"]
}
}
}
I am not really sure how to go about about it?
Can you please suggest changes in my existing filters?
I only need to read the value of the tag <LoadID></LoadID> or <LoadID xmlns=""></LoadID>
That's not the output from stdout { codec => rubydebug }. For best results please answer my questions. I'd also like to see the complete configuration (specifically your inputs).
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.