Getting relevant information from CDATA of XML file with Grok

Omer_Uludag · December 14, 2015, 7:07pm

Hello together,
I try to parse an XML document and get some relevant information about it.
The document looks like this:
<log level="INFO" time="Tue Sep 08 11:42:39 EDT 2015" timel="1441726959272" id="1234567890" cat="COMMUNICATION" comp="WEB" host="localhost" req="" app="" usr="" thread="" origin=""><msg><![CDATA[Method=GET URL=http://test:80/testus?OP=gtm&TReq(Clat=[429566997], Clon=[-1372987576], Decoding_Feat=[], Dlat=[0], Dlon=[0], Accept-Encoding=gzip, Accept=*/*) Result(Content-Encoding=[gzip], Content-Length=[2815], ntCoent-Length=[5276], Content-Type=[text/xml; charset=utf-8]) Status=200 Times=TISP:344/CSI:-/Me:0/Total:344]]></msg><info></info><excp></excp></log>

I have already created an appropriate Logstash pipeline.
However the problem lies in Grok.
I try to get from msg_txt the Clat, Clon, Dlat and Dlon values.
The problem is, that all values are the same. Mean Clon, Dlat and Dlon takes the same value as Clat.
But normally, each of them should find their value in the CDATA part.

The pipeline looks like this:

input {
file {
  path => "/ho/war.log.*"
  start_position => "beginning"

}
}
filter{

  xml {
store_xml => false
source => "message"
xpath => [
     "/log/@level", "level",
     "/log/@time", "time",
     "/log/@timel", "timel",
     "/log/@id", "id",
     "/log/@cat", "cat",
     "/log/@comp", "comp",
     "/log/@host", "host_org",
     "/log/@req", "req",
     "/log/@app", "app",
     "/log/@usr", "usr",
     "/log/@thread", "thread",
     "/log/@origin", "origin",
     "/log/@msg", "msg",
     "/log/msg/text()","msg_txt"
     ]
  }
  grok{ 
break_on_match => false
match => ["msg_txt", "(?<Clat>=\[(-?\d+)\])"]
match => ["msg_txt", "(?<Clon>=\[(-?\d+)\])"]
match => ["msg_txt", "(?<Dlat>=\[(-?\d+)\])"]
match => ["msg_txt", "(?<Dlon>=\[(-?\d+)\])"]
  }

 mutate {
gsub => [
    "Clat", "[=\[\]]", "",
    "Clon", "[=\[\]]", "",
    "Dlat", "[=\[\]]", "",
    "Dlon", "[=\[\]]", ""
]
  }
}
output {
  elasticsearch {
        hosts => "localhost:9200"
}
stdout{}

}

Do you maybe know, where the problem is located?

Best regards

magnusbaeck · December 14, 2015, 7:16pm

When you specify multiple grok expressions they won't continue where the last one stopped. They'll all start from the beginning of the string. Since you're only looking for a number within square brackets they'll all pick up the Clat string. Adjusting the filter as below should address that, and if you move the square brackets and equal sign outside what you capture you won't need the gsub either.

grok {
  break_on_match => false
  match => ["msg_txt", "Clat=\[(?<Clat>-?\d+)\]"]
  match => ["msg_txt", "Clon=\[(?<Clon>-?\d+)\]"]
  match => ["msg_txt", "Dlat=\[(?<Dlat>-?\d+)\]"]
  match => ["msg_txt", "Dlon=\[(?<Dlon>-?\d+)\]"]
}

You should also consider using a kv filter.

Omer_Uludag · December 14, 2015, 8:02pm

Hello Magnus,

thank you very much.
I would have a further question.

Now I changed with the data types of Clat, Clon, Dlat and Dlon from String to Integer with:
convert => { "Clon" => "integer" }

In a next step, I would like to change the integer value of these fields if they are not empty, like:

  if [Clat] != '' {
    mutate {
    convert => { "Clat" => "integer" }
    update => { "Clat" => "Clat * 360 / (4294967296)"} 
    }
  }

However, the Output for these are either '' or 0.

Do you maybe now, if its because of the update function?
Thank you very much in advance.

Best regards

magnusbaeck · December 14, 2015, 8:18pm

The update option doesn't support arithmetic expressions. You'll have to use a ruby filter for that. Also, don't assume that mutate options are applied in the order specified. In this particular case it so happens that update is applied first and then convert (see below) which explains why you get zero or an empty string.

github.com

logstash-plugins/logstash-filter-mutate/blob/master/lib/logstash/filters/mutate.rb#L208-L225


def register
  valid_conversions = %w(string integer float boolean)
  # TODO(sissel): Validate conversion requests if provided.
  @convert.nil? or @convert.each do |field, type|
    if !valid_conversions.include?(type)
      raise LogStash::ConfigurationError, I18n.t(
        "logstash.agent.configuration.invalid_plugin_register",
        :plugin => "filter",
        :type => "mutate",
        :error => "Invalid conversion type '#{type}', expected one of '#{valid_conversions.join(',')}'"
      )
    end
  end


  @gsub_parsed = []
  @gsub.nil? or @gsub.each_slice(3) do |field, needle, replacement|
    if [field, needle, replacement].any? {|n| n.nil?}
      raise LogStash::ConfigurationError, I18n.t(

Omer_Uludag · December 14, 2015, 8:56pm

Hello Magnus,

thank you very much. I have created a ruby part for doing the operations, and they are working.
My aim is to use the tile map in Kibana 4 for my location values. Therefore I have created this:
mutate { add_field => [ "[location]", "%{Clat}" ] add_field => [ "[location]", "%{Clon}" ] convert => [ "[location]", "float" ] }

Now I have a location field. Can I also state the type of location in Logstash as geo_point?
Because, I would like to use ES's automatic mapping.

magnusbaeck · December 14, 2015, 9:26pm

There's no such thing as a geo_point data type on the Logstash side. Configure your ES mapping correctly and make sure you format the field in Logstash in such a way that ES will be able to parse it as geo_point.

Omer_Uludag · December 14, 2015, 11:08pm

Hello Magnus,

I read your answer to another similiar question. You suggested to set manage_template to false and create an explicit mapping.
My explicit mapping looks like this:
curl -XPOST localhost:9200/logstash-2015.12.14 -d '{ "mappings": { "location": { "properties": { "name": { "type": "string" }, "location": { "type": "geo_point" } } } } }'

Now I have to say to Logstash that it puts data into the logstash-2015.12.14 index or? Because now, Logstash does not put the data into logstash-2015.12.14 .

My output looks like this:
output { elasticsearch { hosts => "localhost:9200" manage_template => "false" }

I think the problem of not putting data is this error message:
=>{"create"=>{"_index"=>"logstash-2015.12.14", "_type"=>"logs", "_id"=>"AVGivuEbxX_o2BuLkbb4", "status"=>400, "error"=>{"type"=>"illegal_argument_exception", "reason"=>"Mapper for [location] conflicts with existing mapping in other types:\n[mapper [location] cannot be changed from type [geo_point] to [double]]"}}}, :level=>:warn}

magnusbaeck · December 15, 2015, 6:56am

Don't set mappings for a particular index, use index templates just like Logstash does. Index templates sets a template with e.g. mappings that's applied to all newly created indexes whose name matches a particular pattern. Again, copy the original Logstash file (e.g. /opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-output-elasticsearch-1.0.5-java/lib/logstash/outputs/elasticsearch/elasticsearch-template.json) and use it as a starting point.

The error message at the end means what it says; a given field can't have different mappings for different types in the same index. You may have to reindex to fix this. Or, since you have daily indexes, maybe the problem will be gone tomorrow.

Topic		Replies	Views
Xml parsing issue with xpath Logstash	7	2410	April 25, 2017
Logstash grok don't parse value Logstash	5	490	June 13, 2018
Issue in using xpath in xml filter Logstash	3	1241	February 28, 2019
Targeting Field Content Logstash	17	554	October 22, 2020
Having problems parsing data Logstash	4	315	February 18, 2021

Getting relevant information from CDATA of XML file with Grok

Related topics