After countless hours of trying, I'm about to give up and I'm posting here as a final shot.
My goal is to index a XML file that contains multiple events on the same line, and I'm using Logstash to pre-process the XML file before sending it to ElasticSearch.
Here's a sample of the input (I need to accept HTTP, file and Filebeat):
<?xml version='1.0' encoding='utf-8'?>
<root><group><event id="0"><metadata><meta id="1">meta1</meta><meta id="N">metaN</meta></metadata><payload>base64 file or html page</payload></event><event id="N"><metadata><meta id="1">meta1</meta><meta id="N">metaN</meta></metadata><payload>base64 file or html page</payload></event></group></root>
(formatted XML for better reading, but I need to work with the file above)
<?xml version="1.0" encoding="UTF-8"?>
<root>
<group>
<event id="0">
<metadata>
<meta id="meta1">meta1Content</meta>
<meta id="metaN">metaNContent</meta>
</metadata>
<payload>base64 file or html page</payload>
</event>
<event id="N">
<metadata>
<meta id="1">meta1</meta>
<meta id="N">metaN</meta>
</metadata>
<payload>base64 file or html page</payload>
</event>
</group>
</root>
Please note that the XML file contains just 2 rows with no character after the last >, so no line ending characters.
Having multiple events, I'm expecting my output file to contain multiple JSON too, and this is the format that I'm expecting for a single event:
{
"event": {
"id": 0,
"meta1": "meta1content",
"metaN": "metaNcontent",
"payload": "base64 file or html page"
}
}
Can you guys help me achieving this?
EDIT:
Adding just a quick detail.
I got to something close to my expected result but I was working on the formatted XML, when I tried moving to the original one (the one with two lines) I couldn't do anything as the pipeline stopped on the very first line.