Getting relevant information from CDATA of XML file with Grok


(Ömer Uludağ) #1

Hello together,
I try to parse an XML document and get some relevant information about it.
The document looks like this:
<log level="INFO" time="Tue Sep 08 11:42:39 EDT 2015" timel="1441726959272" id="1234567890" cat="COMMUNICATION" comp="WEB" host="localhost" req="" app="" usr="" thread="" origin=""><msg><![CDATA[Method=GET URL=http://test:80/testus?OP=gtm&TReq(Clat=[429566997], Clon=[-1372987576], Decoding_Feat=[], Dlat=[0], Dlon=[0], Accept-Encoding=gzip, Accept=*/*) Result(Content-Encoding=[gzip], Content-Length=[2815], ntCoent-Length=[5276], Content-Type=[text/xml; charset=utf-8]) Status=200 Times=TISP:344/CSI:-/Me:0/Total:344]]></msg><info></info><excp></excp></log>

I have already created an appropriate Logstash pipeline.
However the problem lies in Grok.
I try to get from msg_txt the Clat, Clon, Dlat and Dlon values.
The problem is, that all values are the same. Mean Clon, Dlat and Dlon takes the same value as Clat.
But normally, each of them should find their value in the CDATA part.

The pipeline looks like this:

input {
file {
  path => "/ho/war.log.*"
  start_position => "beginning"

}
}
filter{

  xml {
store_xml => false
source => "message"
xpath => [
     "/log/@level", "level",
     "/log/@time", "time",
     "/log/@timel", "timel",
     "/log/@id", "id",
     "/log/@cat", "cat",
     "/log/@comp", "comp",
     "/log/@host", "host_org",
     "/log/@req", "req",
     "/log/@app", "app",
     "/log/@usr", "usr",
     "/log/@thread", "thread",
     "/log/@origin", "origin",
     "/log/@msg", "msg",
     "/log/msg/text()","msg_txt"
     ]
  }
  grok{ 
break_on_match => false
match => ["msg_txt", "(?<Clat>=\[(-?\d+)\])"]
match => ["msg_txt", "(?<Clon>=\[(-?\d+)\])"]
match => ["msg_txt", "(?<Dlat>=\[(-?\d+)\])"]
match => ["msg_txt", "(?<Dlon>=\[(-?\d+)\])"]
  }

 mutate {
gsub => [
    "Clat", "[=\[\]]", "",
    "Clon", "[=\[\]]", "",
    "Dlat", "[=\[\]]", "",
    "Dlon", "[=\[\]]", ""
]
  }
}
output {
  elasticsearch {
        hosts => "localhost:9200"
}
stdout{}

}

Do you maybe know, where the problem is located?

Best regards


(Magnus Bäck) #2

When you specify multiple grok expressions they won't continue where the last one stopped. They'll all start from the beginning of the string. Since you're only looking for a number within square brackets they'll all pick up the Clat string. Adjusting the filter as below should address that, and if you move the square brackets and equal sign outside what you capture you won't need the gsub either.

grok {
  break_on_match => false
  match => ["msg_txt", "Clat=\[(?<Clat>-?\d+)\]"]
  match => ["msg_txt", "Clon=\[(?<Clon>-?\d+)\]"]
  match => ["msg_txt", "Dlat=\[(?<Dlat>-?\d+)\]"]
  match => ["msg_txt", "Dlon=\[(?<Dlon>-?\d+)\]"]
}

You should also consider using a kv filter.


(Ömer Uludağ) #3

Hello Magnus,

thank you very much.
I would have a further question.

Now I changed with the data types of Clat, Clon, Dlat and Dlon from String to Integer with:
convert => { "Clon" => "integer" }

In a next step, I would like to change the integer value of these fields if they are not empty, like:

  if [Clat] != '' {
    mutate {
    convert => { "Clat" => "integer" }
    update => { "Clat" => "Clat * 360 / (4294967296)"} 
    }
  }

However, the Output for these are either '' or 0.

Do you maybe now, if its because of the update function?
Thank you very much in advance.

Best regards


(Magnus Bäck) #4

The update option doesn't support arithmetic expressions. You'll have to use a ruby filter for that. Also, don't assume that mutate options are applied in the order specified. In this particular case it so happens that update is applied first and then convert (see below) which explains why you get zero or an empty string.


(Ömer Uludağ) #5

Hello Magnus,

thank you very much. I have created a ruby part for doing the operations, and they are working.
My aim is to use the tile map in Kibana 4 for my location values. Therefore I have created this:
mutate { add_field => [ "[location]", "%{Clat}" ] add_field => [ "[location]", "%{Clon}" ] convert => [ "[location]", "float" ] }

Now I have a location field. Can I also state the type of location in Logstash as geo_point?
Because, I would like to use ES's automatic mapping.


(Magnus Bäck) #6

There's no such thing as a geo_point data type on the Logstash side. Configure your ES mapping correctly and make sure you format the field in Logstash in such a way that ES will be able to parse it as geo_point.


(Ömer Uludağ) #7

Hello Magnus,

I read your answer to another similiar question. You suggested to set manage_template to false and create an explicit mapping.
My explicit mapping looks like this:
curl -XPOST localhost:9200/logstash-2015.12.14 -d '{ "mappings": { "location": { "properties": { "name": { "type": "string" }, "location": { "type": "geo_point" } } } } }'

Now I have to say to Logstash that it puts data into the logstash-2015.12.14 index or? Because now, Logstash does not put the data into logstash-2015.12.14 .

My output looks like this:
output { elasticsearch { hosts => "localhost:9200" manage_template => "false" }

I think the problem of not putting data is this error message:
=>{"create"=>{"_index"=>"logstash-2015.12.14", "_type"=>"logs", "_id"=>"AVGivuEbxX_o2BuLkbb4", "status"=>400, "error"=>{"type"=>"illegal_argument_exception", "reason"=>"Mapper for [location] conflicts with existing mapping in other types:\n[mapper [location] cannot be changed from type [geo_point] to [double]]"}}}, :level=>:warn}


(Magnus Bäck) #8

Don't set mappings for a particular index, use index templates just like Logstash does. Index templates sets a template with e.g. mappings that's applied to all newly created indexes whose name matches a particular pattern. Again, copy the original Logstash file (e.g. /opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-output-elasticsearch-1.0.5-java/lib/logstash/outputs/elasticsearch/elasticsearch-template.json) and use it as a starting point.

The error message at the end means what it says; a given field can't have different mappings for different types in the same index. You may have to reindex to fix this. Or, since you have daily indexes, maybe the problem will be gone tomorrow.


(system) #9