Hi everyone,
I'm dealing with a huge XML and I'm trying to proceed step-by-step.
For the moment I'm experiencing difficulties with multiline and arrays management by Logstash.
This is the simplified XML I'm trying to parse:
<?xml version="1.0" encoding="UTF-8"?>
<properties>
<hostname>crt-mon</hostname>
<date>2016.11.01</date>
<time>01:23:04 CET</time>
<release>11.6</release>
<version>2.1</version>
</properties>
First thing first, I've tried this Logstash input configuration:
input {
file {
path => "/srv/logstash/logs/test_multiline.xml"
type => "test-xml"
start_position => "beginning"
codec => multiline {
pattern => "^<\?properties .*\>"
negate => "true"
what => "previous"
}
}
}
filter {
xml {
store_xml => "false"
source => "message"
xpath => [
"/properties/hostname/text()", "hostname",
"/properties/date/text()", "date",
"/properties/time/text()", "time",
"/properties/release/text()", "release",
"/properties/version/text()", "version"
]
}
mutate {
replace => {"hostname" => "%{[hostname][0]}" }
replace => {"date" => "%{[date][0]}" }
replace => {"time" => "%{[time][0]}" }
replace => {"release" => "%{[release][0]}" }
replace => {"version" => "%{[version][0]}" }
}
}
output { stdout { codec => rubydebug } }
But unfortunately nothing seems to happen, my guess is that Logstash is wayting for the next line, because when I stop the Pipeline I can see that something has been parsed:
{:timestamp=>"2016-10-31T18:54:27.754000+0000", :message=>"Pipeline main started"}
{:timestamp=>"2016-10-31T18:54:38.473000+0000", :message=>"SIGINT received. Shutting down the agent.", :level=>:warn}
{:timestamp=>"2016-10-31T18:54:38.480000+0000", :message=>"stopping pipeline", :id=>"main"}
{
"@timestamp" => "2016-10-31T18:54:39.073Z",
"message" => "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<properties>\n <hostname>crt-mon</hostname>\n <date>2016.11.01</date>\n <time>01:23:04 CET</time>\n <release>11.6</release>\n <version>2.1</version>\n</properties>",
"@version" => "1",
"tags" => [
[0] "multiline"
],
"path" => "/srv/logstash/logs/test_multiline.xml",
"host" => "4d8280939c35",
"type" => "test-xml",
"hostname" => "crt-mon",
"date" => "2016.11.01",
"time" => "01:23:04 CET",
"release" => "11.6",
"version" => "2.1"
}
{:timestamp=>"2016-10-31T18:54:39.826000+0000", :message=>"Pipeline main has been shutdown"}
So I have manually put the whole XML on a single line, and tried with this input configuration:
input {
file {
path => "/srv/logstash/logs/test.xml"
type => "test-xml"
start_position => "beginning"
ignore_older => 0
}
}
This time the thing is working, but I don't understand why the mutate/replace filter is overwriting my fields with the [fieldname][0] text, where I just want to replace the array generated by the XML filter in a single value:
{:timestamp=>"2016-10-31T19:03:33.614000+0000", :message=>"Pipeline main started"}
{
"message" => "<?xml version=\"1.0\" encoding=\"UTF-8\"?><properties><hostname>crt-mon</hostname><date>2016.11.01</date><time>01:23:04 CET</time><release>11.6</release><version>2.1</version></properties>",
"@version" => "1",
"@timestamp" => "2016-10-31T19:03:32.036Z",
"path" => "/srv/logstash/logs/test.xml",
"host" => "1afc6eb0026b",
"type" => "test-xml",
"hostname" => "crt-mon",
"date" => "2016.11.01",
"time" => "01:23:04 CET",
"release" => "11.6",
"version" => "2.1",
"timestamp" => "2016.11.01 01:23:04 CET"
}
{
"message" => "",
"@version" => "1",
"@timestamp" => "2016-10-31T19:03:33.624Z",
"path" => "/srv/logstash/logs/test.xml",
"host" => "1afc6eb0026b",
"type" => "test-xml",
"hostname" => "%{[hostname][0]}",
"date" => "%{[date][0]}",
"time" => "%{[time][0]}",
"release" => "%{[release][0]}",
"version" => "%{[version][0]}",
"timestamp" => "%{[date][0]} %{[time][0]}"
}
This is obviously a problem because if I put a date/match filter later in the configuration to parse the timestamp field, I receive a dateparsefailure from Logstash.
At the end of the story, I'm opening this Topic to kindly ask for a comment on these questions:
- Which is the proper way to let Logstash handle huge multiline XMLs?
- How is possible to remove arrays when only one element is present?
Regards