Hello,
I started working on Logstash and Elasticsearch and my first task was trying to get XML files indexed by Elasticsearch.
For that I have a
- File input plugin that reads whole XML files
- XML filter plugin which creates properties using XPath
- and an Elasticsearch output plugin where it all should get added to
My configuration looks like this
input {
file {
path => "/absolute/path/to/xmls/**/*.xml"
start_position => "beginning"
max_open_files => 10000
mode => "read"
close_older => "1 minute"
codec => multiline {
pattern => "\Z"
what => "previous"
}
}
}
filter {
xml {
source => "message"
store_xml => false
force_array => false
xpath => [
'/html/head/meta_identity/identifier/text()', "meta_identity_identifier",
'/html/head/meta_identity/sortkey/text()', "meta_identity_sortkey",
'/html/head/meta_identity/database/text()', "meta_identity_database",
'/html/head/meta_identity/langauge/text()', "meta_identity_language"
]
}
}
output {
elasticsearch {
index => "xml-data"
hosts => ["localhost:9200"]
sniffing => false
}
stdout { codec => rubydebug }
}
Now to evaluate, let's take an XML file. Here is a snippet of it to visualise:
<?xml version="1.0" encoding="ISO-8859-1"?>
<html>
<head>
<meta_identity>
<identifier>a00001</identifier>
<sortkey>000.000</sortkey>
<database>Guidelines</database>
<language>en</language>
</meta_identity>
...
</head>
</html>
As can be seen from my filter, I want to do a very simple thing and extract these four element values into properties for Elasticsearch.
But my problem is that the XPath entries are not being parsed, I get everything else (@version
, @timestamp
, etc) but none of the properties I defined in either Elasticsearch or stdOut.
I tried creating a mutator to see if that might fix my issue:
filter {
xml {
...
}
mutate {
replace => [
"meta_identity_identifier", "%{meta_identity_identifier}",
"meta_identity_sortkey", "%{meta_identity_sortkey}",
"meta_identity_database", "%{meta_identity_database}",
"meta_identity_language", "%{meta_identity_language}"
]
}
}
Now I can see the properties, but the value is not what it is supposed to be. The values are shown as %{meta_identity_language}
etc.
Logstash doesn't give any insight when run in --verbose
.
What am I missing?