Logstash unable to parse xml file


(safaa) #1

I have a simple xml file , and i am trying to parse it with losgstash, but the logstash could not undersand split field:

my xml file is:

    <xmldata>
     <head1>
      <key1>Value1</key1>
      <key2>Value2</key2>
      <id>0001</id>
      <date>01-01-2016 09:00:00</date>
     </head1>
     <head1>
      <key3>Value3</key3>
     </head1>
    </xmldata> 

my config file is :

input {

 file {

  path => "/home/safaa/Documents/nessus/validate.xml"
  start_position => beginning
  sincedb_path => "/dev/null"
  codec => multiline
  {
   pattern => "^<\?xmldata .*\>"
   negate => true
   what => "previous"
   auto_flush_interval => 1

  }
 }
}

filter {

  xml {
   store_xml => false
   source => "message"
   target => "xml_content"

      }
	 
split{
   field => "xml_content[head1][key1]"
     }

mutate {
   rename => {
      "xml_content[head1][key1]" => "var1"
             }
       }

}

output {

 stdout { codec => rubydebug }
 elasticsearch {
  index => "logstash-xml"
  hosts => ["127.0.0.1:9200"]
  document_id => "%{[id]}"
  document_type => "xmlfiles"

 }

}

my logstash logs:

[2018-10-21T15:17:40,666][INFO ][logstash.pipeline        ] Starting pipeline {:pipeline_id=&gt;"main", "pipeline.workers"=&gt;2, "pipeline.batch.size"=&gt;125, "pipeline.batch.delay"=&gt;50}

[2018-10-21T15:17:41,465][INFO ][logstash.outputs.elasticsearch] Elasticsearch pool URLs updated {:changes=&gt;{:removed=&gt;[], :added=&gt;[[http://127.0.0.1:9200/]](http://127.0.0.1:9200/%5D)}}

[2018-10-21T15:17:41,482][INFO ][logstash.outputs.elasticsearch] Running health check to see if an Elasticsearch connection is working {:healthcheck_url=&gt;http://127.0.0.1:9200/, :path=&gt;"/"}

[2018-10-21T15:17:41,905][WARN ][logstash.outputs.elasticsearch] Restored connection to ES instance {:url=&gt;"http://127.0.0.1:9200/"}

[2018-10-21T15:17:42,009][INFO ][logstash.outputs.elasticsearch] ES Output version determined {:es_version=&gt;6}

[2018-10-21T15:17:42,014][WARN ][logstash.outputs.elasticsearch] Detected a 6.x and above cluster: the `type` event field won't be used to determine the document _type {:es_version=&gt;6}

[2018-10-21T15:17:42,074][INFO ][logstash.outputs.elasticsearch] New Elasticsearch output {:class=&gt;"LogStash::Outputs::ElasticSearch", :hosts=&gt;["//[127.0.0.1:9200](http://127.0.0.1:9200/)"]}

[2018-10-21T15:17:42,109][INFO ][logstash.outputs.elasticsearch] Using mapping template from {:path=&gt;nil}

[2018-10-21T15:17:42,151][INFO ][logstash.outputs.elasticsearch] Attempting to install template {:manage_template=&gt;{"template"=&gt;"logstash-*", "version"=&gt;60001, "settings"=&gt;{"index.refresh_interval"=&gt;"5s"}, "mappings"=&gt;{"_default_"=&gt;{"dynamic_templates"=&gt;[{"message_field"=&gt;{"path_match"=&gt;"message", "match_mapping_type"=&gt;"string", "mapping"=&gt;{"type"=&gt;"text", "norms"=&gt;false}}}, {"string_fields"=&gt;{"match"=&gt;"*", "match_mapping_type"=&gt;"string", "mapping"=&gt;{"type"=&gt;"text", "norms"=&gt;false, "fields"=&gt;{"keyword"=&gt;{"type"=&gt;"keyword", "ignore_above"=&gt;256}}}}}], "properties"=&gt;{"@timestamp"=&gt;{"type"=&gt;"date"}, "@version"=&gt;{"type"=&gt;"keyword"}, "geoip"=&gt;{"dynamic"=&gt;true, "properties"=&gt;{"ip"=&gt;{"type"=&gt;"ip"}, "location"=&gt;{"type"=&gt;"geo_point"}, "latitude"=&gt;{"type"=&gt;"half_float"}, "longitude"=&gt;{"type"=&gt;"half_float"}}}}}}}}

[2018-10-21T15:17:43,863][INFO ][logstash.pipeline        ] Pipeline started successfully {:pipeline_id=&gt;"main", :thread=&gt;"#&lt;Thread:0x2d7d2c2a run&gt;"}

[2018-10-21T15:17:43,997][INFO ][logstash.agent           ] Pipelines running {:count=&gt;1, :running_pipelines=&gt;[:main], :non_running_pipelines=&gt;[]}

[2018-10-21T15:17:44,021][INFO ][filewatch.observingtail  ] START, creating Discoverer, Watch with file and sincedb collections

[2018-10-21T15:17:45,052][INFO ][logstash.agent           ] Successfully started Logstash API endpoint {:port=&gt;9600}

[2018-10-21T15:17:47,216][WARN ][org.logstash.FieldReference] Detected ambiguous Field Reference `xml_content[head1][key1]`, which we expanded to the path `[xml_content, head1, key1]`; in a future release of Logstash, ambiguous Field References will not be expanded.

[2018-10-21T15:17:47,233][WARN ][logstash.filters.split   ] Only String and Array types are splittable. field:xml_content[head1][key1] is of type = NilClass
my logstash logs are:

#2

check this,

hope this help you


#3

I did not get it running with more that one nested xml object. The problem is the filter->split->field definition:

split{
    field => "xml_content[head1][key1]"
 }

It works with xml_content[head1], so when it's nested once, but not nested twice.

Got a workaround with an xpath:

filter {
    xml {
        store_xml => false
        source => "message"
        target => "xml_content"
        xpath => ["/xmldata/head1/key1/text()","key1"]
    }
    split{
        field => "key1"
    }
}

This puts the key1 xml nodes into an array called key1 and then splits it.

Sidenote1: I think your input->file->codec->multiline config is bad. You have the regex ^<\?xmldata .*\>. This does not match <xmldata> nor </xmldata>. Fix it or get rid of it.

Sidenote2: Change the title of the topic to reflect the problem properly. It suggests it's a problem with xml parsing, but it is more one with splitting.

Edit
Got it working. You need the following split definition:

filter {
    split{
        field => "[xml_content][head1][0][key1]"
    }
}

There are multiple head1xml nodes. So you must specify which one to take. Therefore the [0], which states to travers into the first head1 node.


(system) #4

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.