Thanks for the explanation - very helpful! My config now looks like this:
input {
file {
path => [ "C:/temp/SCAP/*.xml" ]
start_position => "beginning"
codec => multiline {
pattern => "^ZsExDrC"
what => "previous"
negate => true
auto_flush_interval => 2
max_lines => 50000
}
}
}
filter {
xml {
namespaces => {
"cdf" => "http://checklists.nist.gov/xccdf/1.2"
"xsi" => "http://www.w3.org/2001/XMLSchema-instance"
"dc" => "http://purl.org/dc/elements/1.1/"
}
source => "message"
target => "doc"
xpath => {
"/cdf:Benchmark/cdf:title/text()" => "benchmark"
"/cdf:Benchmark/cdf:plain-text[@id='release-info']/text()" => "release-info"
"/cdf:Benchmark/cdf:TestResult/@start-time" => "[@metadata][timestamp]"
"/cdf:Benchmark/cdf:TestResult/cdf:target/text()" => "host.name"
"/cdf:Benchmark/cdf:TestResult/cdf:target-address[normalize-space()][1]/text()" => "host.ip"
"/cdf:Benchmark/cdf:TestResult/cdf:target-facts/cdf:fact[@name='urn:scap:fact:asset:identifier:os_name']/text()" => "host.os.name"
"/cdf:Benchmark/cdf:TestResult/cdf:target-facts/cdf:fact[@name='urn:scap:fact:asset:identifier:processor']/text()" => "host.cpu"
"/cdf:Benchmark/cdf:TestResult/cdf:target-facts/cdf:fact[@name='urn:scap:fact:asset:identifier:processor_mhz']/text()" => "host.cpu.speed"
"/cdf:Benchmark/cdf:TestResult/cdf:target-facts/cdf:fact[@name='urn:scap:fact:asset:identifier:physical_memory']/text()" => "host.memory"
"/cdf:Benchmark/cdf:TestResult/cdf:score[1]/text()" => "vulnerability.score.base"
}
}
date { match => [ "[@metadata][timestamp][0]", "YYYY-MM-dd'T'HH:mm:ss" ] }
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "scap-results-%{+YYYY.MM.dd}"
}
}
However, I'm still not getting any timestamp field or any other parsed fields. It just ingests into an index with zero documents. I do get this warning in logstash-plain.log (I added line breaks to make it more legible):
[2022-01-18T13:17:56,173][WARN ][logstash.outputs.elasticsearch][scap-results]
[fad371fb3a2f1bced415913c622407598fadd3ce093c68958e81938693f4259c]
Could not index event to Elasticsearch. {:status=>400, :action=>["index", {:_id=>nil,
:_index=>"scap-results-2022.01.17", :routing=>nil}, {
"@timestamp"=>2022-01-17T16:15:01.000Z, "host.ip"=>["192.168.40.208"],
"message"=>"<?xml version=\"1.0\" ....<the full XML document>...............
"tags"=>["multiline"], "benchmark"=>["Windows 10 Security Technical Implementation Guide"], "host.cpu"=>["AMD A6-5400K APU with Radeon(tm) HD Graphics "], "host"=>"Finlandia", "host.memory"=>["16384"]...........
"error"=>{"type"=>"illegal_argument_exception",
"reason"=>"can't merge a non object mapping [doc.Value.value] with an object mapping"}}}}
From that, you can see that it is in fact parsing the fields (for instance, the host.ip
, host.cpu
, etc. as well as the @timestamp
), but it is not able to index it. I found this related discussion, but am not clear on how to use that solution, since that post seems more to do with machine learning jobs. In another discussion, @xeraa said:
Either have a concrete value or a subdocument in a field, but don't mix them.
But I don't understand this. My fields have concrete values (from either the XML element or attribute).