Hello together,
I try to parse an XML document and get some relevant information about it.
The document looks like this:
<log level="INFO" time="Tue Sep 08 11:42:39 EDT 2015" timel="1441726959272" id="1234567890" cat="COMMUNICATION" comp="WEB" host="localhost" req="" app="" usr="" thread="" origin=""><msg><![CDATA[Method=GET URL=http://test:80/testus?OP=gtm&TReq(Clat=[429566997], Clon=[-1372987576], Decoding_Feat=[], Dlat=[0], Dlon=[0], Accept-Encoding=gzip, Accept=*/*) Result(Content-Encoding=[gzip], Content-Length=[2815], ntCoent-Length=[5276], Content-Type=[text/xml; charset=utf-8]) Status=200 Times=TISP:344/CSI:-/Me:0/Total:344]]></msg><info></info><excp></excp></log>
I have already created an appropriate Logstash pipeline.
However the problem lies in Grok.
I try to get from msg_txt the Clat, Clon, Dlat and Dlon values.
The problem is, that all values are the same. Mean Clon, Dlat and Dlon takes the same value as Clat.
But normally, each of them should find their value in the CDATA part.
The pipeline looks like this:
input {
file {
path => "/ho/war.log.*"
start_position => "beginning"
}
}
filter{
xml {
store_xml => false
source => "message"
xpath => [
"/log/@level", "level",
"/log/@time", "time",
"/log/@timel", "timel",
"/log/@id", "id",
"/log/@cat", "cat",
"/log/@comp", "comp",
"/log/@host", "host_org",
"/log/@req", "req",
"/log/@app", "app",
"/log/@usr", "usr",
"/log/@thread", "thread",
"/log/@origin", "origin",
"/log/@msg", "msg",
"/log/msg/text()","msg_txt"
]
}
grok{
break_on_match => false
match => ["msg_txt", "(?<Clat>=\[(-?\d+)\])"]
match => ["msg_txt", "(?<Clon>=\[(-?\d+)\])"]
match => ["msg_txt", "(?<Dlat>=\[(-?\d+)\])"]
match => ["msg_txt", "(?<Dlon>=\[(-?\d+)\])"]
}
mutate {
gsub => [
"Clat", "[=\[\]]", "",
"Clon", "[=\[\]]", "",
"Dlat", "[=\[\]]", "",
"Dlon", "[=\[\]]", ""
]
}
}
output {
elasticsearch {
hosts => "localhost:9200"
}
stdout{}
}
Do you maybe know, where the problem is located?
Best regards