Getting timestamp field from XML attribute

Tim_Mobley · January 18, 2022, 6:57pm

Thanks for the explanation - very helpful! My config now looks like this:

input {
  file {
    path => [ "C:/temp/SCAP/*.xml" ]
    start_position => "beginning"
    codec => multiline {
      pattern => "^ZsExDrC" 
      what => "previous" 
      negate => true 
      auto_flush_interval => 2
      max_lines => 50000
    }
  }
}

filter {
  xml {
    namespaces => {
    "cdf" => "http://checklists.nist.gov/xccdf/1.2" 
    "xsi" => "http://www.w3.org/2001/XMLSchema-instance" 
    "dc" => "http://purl.org/dc/elements/1.1/"
    }
    source => "message"
    target => "doc"
    xpath => { 
    "/cdf:Benchmark/cdf:title/text()" => "benchmark"
    "/cdf:Benchmark/cdf:plain-text[@id='release-info']/text()" => "release-info"
    "/cdf:Benchmark/cdf:TestResult/@start-time" => "[@metadata][timestamp]"
    "/cdf:Benchmark/cdf:TestResult/cdf:target/text()" => "host.name"
    "/cdf:Benchmark/cdf:TestResult/cdf:target-address[normalize-space()][1]/text()" => "host.ip"
    "/cdf:Benchmark/cdf:TestResult/cdf:target-facts/cdf:fact[@name='urn:scap:fact:asset:identifier:os_name']/text()" => "host.os.name"
    "/cdf:Benchmark/cdf:TestResult/cdf:target-facts/cdf:fact[@name='urn:scap:fact:asset:identifier:processor']/text()" => "host.cpu"
    "/cdf:Benchmark/cdf:TestResult/cdf:target-facts/cdf:fact[@name='urn:scap:fact:asset:identifier:processor_mhz']/text()" => "host.cpu.speed"
    "/cdf:Benchmark/cdf:TestResult/cdf:target-facts/cdf:fact[@name='urn:scap:fact:asset:identifier:physical_memory']/text()" => "host.memory"
    "/cdf:Benchmark/cdf:TestResult/cdf:score[1]/text()" => "vulnerability.score.base"
    }
  }
  date { match => [ "[@metadata][timestamp][0]", "YYYY-MM-dd'T'HH:mm:ss" ] }
}

output {
  elasticsearch {
  hosts => ["localhost:9200"]
  index => "scap-results-%{+YYYY.MM.dd}"
  }
}

However, I'm still not getting any timestamp field or any other parsed fields. It just ingests into an index with zero documents. I do get this warning in logstash-plain.log (I added line breaks to make it more legible):

[2022-01-18T13:17:56,173][WARN ][logstash.outputs.elasticsearch][scap-results]
[fad371fb3a2f1bced415913c622407598fadd3ce093c68958e81938693f4259c] 
Could not index event to Elasticsearch. {:status=>400, :action=>["index", {:_id=>nil,
:_index=>"scap-results-2022.01.17", :routing=>nil}, {
"@timestamp"=>2022-01-17T16:15:01.000Z, "host.ip"=>["192.168.40.208"],
"message"=>"<?xml version=\"1.0\" ....<the full XML document>...............
"tags"=>["multiline"], "benchmark"=>["Windows 10 Security Technical Implementation Guide"], "host.cpu"=>["AMD A6-5400K APU with Radeon(tm) HD Graphics   "], "host"=>"Finlandia", "host.memory"=>["16384"]...........
"error"=>{"type"=>"illegal_argument_exception", 
"reason"=>"can't merge a non object mapping [doc.Value.value] with an object mapping"}}}}

From that, you can see that it is in fact parsing the fields (for instance, the host.ip, host.cpu, etc. as well as the @timestamp), but it is not able to index it. I found this related discussion, but am not clear on how to use that solution, since that post seems more to do with machine learning jobs. In another discussion, @xeraa said:

Either have a concrete value or a subdocument in a field, but don't mix them.

But I don't understand this. My fields have concrete values (from either the XML element or attribute).

Topic		Replies	Views
Problem with xml filter Logstash	12	4023	May 2, 2017
Dateparse failure When trying to get date from xml file Logstash	8	657	May 14, 2020
XML timestamp extraction? Logstash	6	447	August 29, 2019
Date filter fails to convert String value Logstash	3	1042	October 23, 2017
Date parsing issue for getting timestamp Logstash	4	1247	May 18, 2017

Getting timestamp field from XML attribute

Related topics