doc is not inside arguments in the XML you show, so that xml filter should be
xml {
source => "message"
store_xml => false
xpath => [
"//robot/suite/test/kw/doc/text()", "doc_field",
"//robot/suite/test/kw/arguments/arg/text()", "arg_field"
]
}
which will give you
"doc_field" => [
[0] "Some text I want to index"
],
"arg_field" => [
[0] "Some other text I want to index"
]
if the XML is a single event. By default a file input reads each line of the file as a separate event and runs it through the pipeline. And no single line of the file is valid XML, so none of it gets parsed. You need to use a multiline filter to combine all the lines of the file into a single event.
This filter takes every line that does not match ^Spalanzani (i.e., it takes every line) and combines them into one event. The auto_flush_interval is required because otherwise it will wait forever for a line that does match ^Spalanzani.
input {
file {
path => "/home/user/foo.xml"
sincedb_path => "/dev/null" start_position => "beginning"
codec => multiline { pattern => "^Spalanzani" negate => true what => "previous" auto_flush_interval => 2 }
}
}
This is using the file input in "tail" mode. That input also has a "read" mode which provides another way of doing this.