Multiple events in the same input XML

After countless hours of trying, I'm about to give up and I'm posting here as a final shot.
My goal is to index a XML file that contains multiple events on the same line, and I'm using Logstash to pre-process the XML file before sending it to ElasticSearch.

Here's a sample of the input (I need to accept HTTP, file and Filebeat):

<?xml version='1.0' encoding='utf-8'?>
<root><group><event id="0"><metadata><meta id="1">meta1</meta><meta id="N">metaN</meta></metadata><payload>base64 file or html page</payload></event><event id="N"><metadata><meta id="1">meta1</meta><meta id="N">metaN</meta></metadata><payload>base64 file or html page</payload></event></group></root>

(formatted XML for better reading, but I need to work with the file above)

<?xml version="1.0" encoding="UTF-8"?>
<root>
   <group>
      <event id="0">
         <metadata>
            <meta id="meta1">meta1Content</meta>
            <meta id="metaN">metaNContent</meta>
         </metadata>
         <payload>base64 file or html page</payload>
      </event>
      <event id="N">
         <metadata>
            <meta id="1">meta1</meta>
            <meta id="N">metaN</meta>
         </metadata>
         <payload>base64 file or html page</payload>
      </event>
   </group>
</root>

Please note that the XML file contains just 2 rows with no character after the last >, so no line ending characters.

Having multiple events, I'm expecting my output file to contain multiple JSON too, and this is the format that I'm expecting for a single event:

{
   "event": {
      "id": 0,
      "meta1": "meta1content",
      "metaN": "metaNcontent",
      "payload": "base64 file or html page"
   }
}

Can you guys help me achieving this?

EDIT:
Adding just a quick detail.
I got to something close to my expected result but I was working on the formatted XML, when I tried moving to the original one (the one with two lines) I couldn't do anything as the pipeline stopped on the very first line.

What's your pipeline look like?

I'm not at work now so I don't have my config, is it really needed to provide me some guidance?

Yes.

That is going to require a bunch of mutates, but did you get as far as getting one event per event?

  xml { source => "message" target => "theXML" }
  if [theXML][group][0][event] { split { field => "[theXML][group][0][event]" } }

Not really, to be honest, because I went with a multiline codec splitting on <event> and this caused my pipeline to stop on the first line.

I'm not sure if I can try your suggestion because I don't have my current config file here at home, but I'll try to write a new one to see if I can get it working.

Thanks for your help.

Have you tried something like this?

input {
  file {
    path => "C:/Example/*.xml"
	codec => multiline {
	  pattern "<event>"
	  what "next"
	}
  }
}
filter {
  xml {
    source => "message"
	xpath => [
	  "event/metadata/meta id="meta1"/text()", "Meta1",
	  "event/metadata/meta id="metaN"/text()", "MetaN",
	]
  }
if "xml version" in [message] {
  drop { }
  }
}

@wwalker @Badger

Just tried your solutions with OP's XML and I'm always getting 2 event in the same group, which is not what I need.

I need any event to be the root of the result JSON, meaning that I should get 2 JSON when trying with 2 event.

Ah, I missed the part where the event xml tag is different on each occurrence. Just need to change the multiline pattern to something like "<event *"....though I don't think * works as a wildcard and I can't recall what the wildcard character/pattern is.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.