I faced a similar problem parsing some Nessus scans that are in XML format, my solution after countless nights of trying to work with xpath and logstash, was to use a python program to extract al the data, convert it to JSON format and print it to the stdout, sound like a lot, but is easy to implement:
Python Program
import xml.etree.ElementTree as ET
..........
def getProperty(node, propertyName, defaultValue):
obj = node.find(propertyName)
if obj is None:
return defaultValue
return obj.text
def getTag(node, tagName, defaultValue):
obj = node.get(tagName)
if obj is None:
return defaultValue
return obj
..........
def parseFile(f):
root = ET.parse(f)
for host in root.iter('MainKey'):
mystruct = mynewstruct()
mystruct .name = host.get('name')
for properties in host.iter('mainProperties'):
for tag in properties:
if tag.get('name') == 'ip':
mystruct .ip = tag.text
if tag.get('name') == 'os':
mystruct .os = tag.text
..........
for item in host.iter('OtherItem'):
if int(item.get('myitem')) <= 0:
continue
mystruct .severity = getTag(item, 'myitem', '')
mystruct .port = getTag(item, 'port', '')
mystruct .description = getProperty(item, 'description', '')
..........
j_data = json.dumps(mystruct .__dict__)
print (j_data)
Logstash Pipeline
input {
exec {
command => "python /mycustopath/scripts/parse.py"
interval => 60
codec => multiline {
pattern => "^\n"
what => "previous"
}
}
}
filter {
if [message] == "()" {
drop{}
}
json {
source => "message"
}
..........
Hope it Helps!.
Useful References:
https://docs.python.org/3/library/xml.etree.elementtree.html