Hello,
I started working on Logstash and Elasticsearch and my first task was trying to get XML files indexed by Elasticsearch.
For that I have a
- File input plugin that reads whole XML files
 - XML filter plugin which creates properties using XPath
 - and an Elasticsearch output plugin where it all should get added to
 
My configuration looks like this
input {
  file {
    path => "/absolute/path/to/xmls/**/*.xml"
    start_position => "beginning"
    max_open_files => 10000
    mode => "read"
    close_older => "1 minute"
    codec => multiline {
      pattern => "\Z"
      what => "previous"
    }
  }
}
filter {
  xml {
    source => "message"
    store_xml => false
    force_array => false
    xpath => [
      '/html/head/meta_identity/identifier/text()', "meta_identity_identifier",
      '/html/head/meta_identity/sortkey/text()', "meta_identity_sortkey",
      '/html/head/meta_identity/database/text()', "meta_identity_database",
      '/html/head/meta_identity/langauge/text()', "meta_identity_language"
    ]
  }
}
output {
  elasticsearch {
    index => "xml-data"
    hosts => ["localhost:9200"]
    sniffing => false
  }
  stdout { codec => rubydebug }
}
Now to evaluate, let's take an XML file. Here is a snippet of it to visualise:
<?xml version="1.0" encoding="ISO-8859-1"?>
<html>
<head>
 <meta_identity>
  <identifier>a00001</identifier>
  <sortkey>000.000</sortkey>
  <database>Guidelines</database>
  <language>en</language>
 </meta_identity>
 ...
</head>
</html>
As can be seen from my filter, I want to do a very simple thing and extract these four element values into properties for Elasticsearch.
But my problem is that the XPath entries are not being parsed, I get everything else (@version, @timestamp, etc) but none of the properties I defined in either Elasticsearch or stdOut.
I tried creating a mutator to see if that might fix my issue:
filter {
  xml {
    ...
  }
  mutate {
    replace => [
      "meta_identity_identifier", "%{meta_identity_identifier}",
      "meta_identity_sortkey", "%{meta_identity_sortkey}",
      "meta_identity_database", "%{meta_identity_database}",
      "meta_identity_language", "%{meta_identity_language}"
    ]
  }
}
Now I can see the properties, but the value is not what it is supposed to be. The values are shown as %{meta_identity_language} etc.
Logstash doesn't give any insight when run in --verbose.
What am I missing?