Hello,
I'm using logstash to parse an XML file, split it, do some basic extraction and send results to elasticsearch. I also use kibana. All in kubernetes.
Logstash's version is 6.2.4.
Elasticsearch's version is 6.2.4.
Kibana's version is 6.2.4.
My input xml file is more than 100 000 lines and less than 300 000 lines.
My config file is :
input {
file {
path => "/usr/share/logstash/files/test7.xml"
start_position => beginning
sincedb_path => "/dev/null"
codec => multiline
{
pattern => "^<\?Report .*\>"
negate => true
what => "previous"
max_lines => 300000
}
}
}
filter {
xml {
store_xml => false
source => "message"
xpath => ["/Report/Takes/Take", "take"]
}
mutate {
remove_field => [ "message" ]
}
split {
field => "[take]"
}
xml {
source => "take"
store_xml => "false"
xpath => ["/Take/CarId/text()","carId"]
xpath => ["/Take/ModelId/text()","modelId"]
xpath => ["/Take/ColorDetails/Mode/text()","mode"]
xpath => ["/Take/ColorDetails/Polarisation/text()","polarisation"]
}
}
output {
elasticsearch {
index => "logstash-test-xml"
hosts => ["es-svc:25000"]
document_type => "xmlfiles"
}
stdout { codec => rubydebug }
}
My xml file (simplified for the forum to only show 2 exemples of the "data" I use) looks like this :
<?xml version="1.0" standalone="yes"?>
<Report>
<ReportingTime>2018-08-07T08:15:37</ReportingTime>
<ValidityStart>2018-08-07T19:00:00</ValidityStart>
<ValidityStop>2018-09-02T22:00:00</ValidityStop>
<Takes>
<Take>
<CarId>S1A</CarId>
<ModelId>164761</ModelId>
<ColorDetails>
<InstrumentId>Tec instrument</InstrumentId>
<Mode>AA</Mode>
<Swath>BB</Swath>
<Polarisation>DV</Polarisation>
</ColorDetails>
</Take>
<Take>
<CarId>S1A</CarId>
<ModelId>164762</ModelId>
<ColorDetails>
<InstrumentId>Tec instrument</InstrumentId>
<Mode>AB</Mode>
<Swath>DC</Swath>
<Polarisation>DH</Polarisation>
</ColorDetails>
</Take>
</Takes>
</Report>
Now the log with debug level, only the end when the big xml file is almost totally parsed. Then I wait 4 minutes because I have no data in elasticsearch and kill logstash and only after the kill my data is sent to elasticsearch:
https://pastebin.com/eViSL81t
(I had to post in pastebin due to size limit on the forum)
What is wrong with my configuration ?
Thanks.