Indexing Xml subfields as a new field

I faced a similar problem parsing some Nessus scans that are in XML format, my solution after countless nights of trying to work with xpath and logstash, was to use a python program to extract al the data, convert it to JSON format and print it to the stdout, sound like a lot, but is easy to implement:

Python Program

import xml.etree.ElementTree as ET
..........
def getProperty(node, propertyName, defaultValue):
	obj = node.find(propertyName)
	if obj is None:
		return defaultValue
	return obj.text

def getTag(node, tagName, defaultValue):
	obj = node.get(tagName)
	if obj is None:
		return defaultValue
	return obj
..........
def parseFile(f):
	root = ET.parse(f)
	for host in root.iter('MainKey'):
		mystruct = mynewstruct()
		mystruct .name = host.get('name')
		for properties in host.iter('mainProperties'):
			for tag in properties:
				if tag.get('name') == 'ip':
					mystruct .ip = tag.text
				if tag.get('name') == 'os':
					mystruct .os = tag.text
..........
		for item in host.iter('OtherItem'):
			if int(item.get('myitem')) <= 0:
				continue
			mystruct .severity = getTag(item, 'myitem', '')
			mystruct .port = getTag(item, 'port', '')
			mystruct .description = getProperty(item, 'description', '')
..........
			j_data = json.dumps(mystruct .__dict__)
			print (j_data)

Logstash Pipeline

input {
    exec {
        command => "python /mycustopath/scripts/parse.py"
		interval => 60
		codec => multiline {
			pattern => "^\n"
			what => "previous"
		}
    }
}

filter {
	if [message] == "()" {
		drop{}
	}
	
	json {
		source => "message"
	}
..........

Hope it Helps!.

Useful References:
https://docs.python.org/3/library/xml.etree.elementtree.html

2 Likes