Parse XML by Document not Element

As a test, I am trying to ingest the following simple XML file (the actual production file is huge).

<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>

...using the following Logstash pipeline configuration file:

input {
  file {
    path => [ "C:/temp/TEST/*.xml" ]
	start_position => "beginning"
  }
}

filter {
  xml {
    source => "message"
	target => "doc"
  }
}

output {
  elasticsearch {
  hosts => ["localhost:9200"]
  index => "test-results-%{+YYYY.MM.dd}"
  }
}

In Kibana, each element of the XML file shows up as its own "event", in this case there were 5 XML elements (e.g., <heading></heading>), so there are 5 hits:

What I want instead is to have one "hit" per document, with the XML elements as fields. So using the simple file as an example, there would be one hit with 5 fields. Is this possible?

Maybe part of the problem is that I'm not clear on how the source and target settings are to be used. That is, how do you create fields and then put the XML elements in the fields (for instance with xpath)?

Use a multiline codec on the file input as described here, so that the XML is in a single event.

1 Like

and also You can find the align answer here

1 Like

@Badger thank you! Two follow-ups if you don't mind:

  1. For this test, I'm using file input, but the end state will be beats input (from Filebeat on a file server), and this doc says multi-line codecs don't work with Beats input. Will the multi-line settings in filebeat.yml also work for my XML files?
  2. Can I still use the xpath filter plugin to parse XML that has been collapsed to one line?

right now I'm looking for the solution for reading mass of xml file to parsing per document it means by file EOF (because it's related to internal xml attribute) when Logstash is reading files it thought that he can process a few file in the same chain. But the output such of operation is not desired.

  1. I am not using filebeat but I believe that it has all the same multiline functionality that the codec does.

  2. Absolutely.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.