Hello,
I've been using the XML filtering plugin because I need to parse some XML data.
This is a simple example:
<task code="a01" status="wip"/>
<task code="a02" status="nwg"/>
<task code="a03" status="nwg">
Description Line 1
Description Line 2
Description Line 3
Description Line 4
</task>
<task code="a04" status="wip">
<comment author="afusco">
I've finished this part.
</comment>
</task>
I'm trying to extract the code of these tasks
filter {
xml {
source => "message"
store_xml => false
xpath => [
"/task/@code", "task_code",
"/task/@status", "task_status",
]
}
}
The thing is:
It's filtering correctly the lines that contains just a <task>
tag. I can see the output is correct. But when it's processing the rest of the lines, for example, comment
tags, it's parsing wrong.
To avoid it, I added the following simple condition just to drop the lines aren't task
tags:
if [message] !~ /^<task/ {
drop { }
}
But this is a workaround.
- Exists any way to just parse the desired specific tags and at this way Elasticsearch doesn't receive also the undesired data? It could be good to drop the data if it's not in the
xpath
array.
Example of the output:
When it's a task
tag:
{
"path" => "/var/log/xml2.log",
"@timestamp" => 2022-05-05T16:22:24.602Z,
"@version" => "1",
"task_code" => [
[0] "a01"
],
"task_status" => [
[0] "wip"
],
When it's not:
{
"@version" => "1",
"@timestamp" => 2022-05-06T12:27:01.559Z,
"host" => "elastic",
"message" => "\t<comment author="afusco">",
"path" => "/var/log/xml2.log"
}
Thanks.