Indexing Xml subfields as a new field

Hi all,

I have a xml-File with windows eventlogs that is structured like this:

<Events>
      <Event>
             <Computer>...</Computer>
             ...
             <EventData> 
                <Data Name = "ErrorCode"> 0 </Data>
                <Data Name = "ipaddress"> 127.0.0.1 </Data>
                <Data Name = "port">800</Data>
                ...
             </EventData>
        </Event>
        <Event>
           ...
       </Event>
<Events>

The problem is I am not able to extract sub fields of EventData and add those fields in the index.
For Example: I need to extract sub field "ErrorCode" from EventData and add a separate field "error" in the index with value of ErrorCode, i.e, 0.
At first I added this line in my xml filter

filter {
    xml{
        source => "message"

        store_xml => false

        target => "Event"

        xpath => ["/Event/EventData/Data[@Name='ipaddress']", "sourceIP"]
    }
}

But the field wasn't created.
So I tried using split filter

filter {

    xml{

        source => "message"

        store_xml => false

        target => "parsed"

    }

    split{

        field => "[parsed][Event]"

        add_field => {

            sourceIP => "%{[parsed][Event][EventData][Data][@Name='ipaddress']}"

        }

    }

    mutate {

    remove_field => ["message", "host"]

    }

}

Even this didn't work. I searched on the internet but only got a solution for using ruby, but there i can't rename my field :
Xml filtering subfields - #23

ruby {
        code => '
            e = event.get("[@metadata][theXML][Event][EventData][Data]")
            if e
                e.each { |x|
                    event.set(x["Name"], x["content"])
                }
            end
        '
    }

Is there any other way to extract subfields, or am I doing something wrong in my conf file here sourceIP => "%{[parsed][Event][EventData][Data][@Name='ipaddress']}"

It is unclear what your XML structure looks like. Do you have multiple <event> elements inside an <events> element? If so, then

xpath => { "/Events/Event/EventData/Data[@Name='ipaddress']" => "sourceIP" }

will produce

  "sourceIP" => [
    [0] "<Data Name=\"ipaddress\"> 127.0.0.1 </Data>"
],

and

xpath => { "string(/Events/Event/EventData/Data[@Name='ipaddress'])" => "sourceIP" }

will produce

  "sourceIP" => [
    [0] " 127.0.0.1 "
],

and

xpath => { "normalize-space(string(/Events/Event/EventData/Data[@Name='ipaddress']))" => "sourceIP" }

will produce

  "sourceIP" => [
    [0] "127.0.0.1"
],

yeah, i have multiple <event> elements inside an <events> element. So how will I be able to get

sourceIP => 127.0.0.1

xpath always creates an array. You could use a split filter on that.

1 Like

Hey, thankyou for your insight. I had gone through earlier comments of a closed topic and found that we can use force_array => false.
I am still having the issue that the subfield is not getting extracted in logstash but works in online xpath tester. I tried online xpath-tester and field was getting extracted there
I used the xpath:
/Event/EventData/Data[@Name='IpAddress']/text()

I tried same approach on another sample logs and it worked:

<form>
    <row>
        <panel>
            <title>CCR_CUSTOMER_FEATURE_AVG_RESPONSE_TIME</title>
            <chart>
                <search>
                    <query>index="asvcardmadinternalbureau" | stats avg(response_time_in_ms) as "Average response_time_in_ms" | gauge "Average response_time_in_ms" 0 50 75 100</query>
                    <earliest>$timerange.earliest$</earliest>
                    <latest>$timerange.latest$</latest>
                </search>
                <option name="charting.chart">radialGauge</option>
                <option name="charting.legend.placement">right</option>
            </chart>
        </panel>
    </row>
    <row>
        <panel>
            <title>CCR_CUSTOMER_FEATURE_COUNT</title>
            <chart>
                <search>
                    <query>index="asvcardmadinternalbureau" | stats count</query>
                    <earliest>$timerange.earliest$</earliest>
                    <latest>$timerange.latest$</latest>
                </search>
            </chart>
        </panel>
    </row>
</form>

with conf file:

input{
    file{
        path => ""
        start_position => "beginning"
        sincedb_path => "nul"
        type => "xml"
        codec => multiline {
            pattern => "^\s\s\s\s(\<row\>)"
            negate => "true"
            what => "previous"
        }
    }
}
filter{
    xml{
        source => "message"
        store_xml => false
        target => "row"
        remove_namespaces => true
        xpath => [
            "/row/panel/title/text()", "VizTitle",
            "/row/panel/chart/search/query/text()", "VizQuery",
            "/row/panel/chart/search/earliest/text()", "EarliestTime",
            "/row/panel/chart/search/latest/text()", "LatestTime",
            "/row/panel/chart/option[@name='charting.chart']/text()" , "typechart"
        ]
        
    }
}

output {
     elasticsearch {
        action=>"index"
        index=>"xml_log"
        hosts=>["localhost:9200"]
    }
    stdout { codec => rubydebug}
}

This works fine and was able to extract fields, but the same approach doesn't work on windows logs and conf file.
my windows logs were

<Events>
     <Event xmlns:conf ='http://schemas.microsoft.com/win/2004/08/events/event'>
        <System>
            <EventID>4624</EventID>
            <Version>2</Version>
            <Level>0</Level>
            <Task>12544</Task>
            <Opcode>0</Opcode>
            <Execution ProcessID='1088' ThreadID='8364'/>
            <Channel>Security</Channel>
            <Security/>
        </System>
        <EventData>
            <Data Name='ProcessId'>0x408</Data>
            <Data Name='ProcessName'>C:\Windows\System32\services.exe</Data>
            <Data Name='IpAddress'>160.39.47.0</Data>
            <Data Name='IpPort'>-</Data>
            <Data Name='ImpersonationLevel'>%%1833</Data>
            <Data Name='RestrictedAdminMode'>-</Data>
            <Data Name='TargetOutboundUserName'>-</Data>
            <Data Name='TargetOutboundDomainName'>-</Data>
            <Data Name='VirtualAccount'>%%1843</Data>
            <Data Name='TargetLinkedLogonId'>0x0</Data>
            <Data Name='ElevatedToken'>%%1842</Data>
        </EventData>
    </Event>
</Events>

and Conf file in logstash :

input{
    file{
            path => ""
            start_position => "beginning"
            sincedb_path => "nul"
            type => "xml"
            codec => multiline {
                    pattern => "</Event>" 
                    negate => "true"
                    what => "previous"}
        }
}
filter {
    xml{
        source => "message"
        store_xml => false
        target => "Event"
        remove_namespaces => true
        force_array => false
        xpath =>[
            "/Event/EventData/Data[@Name='IpAddress']/text()" , "source_ip",
            "/Event/System/EventID" , "idofevent"
        ]
    }
}
output{
    elasticsearch {
        action=>"index"
        index=>"unaccounted_location"
        hosts=>["localhost:9200"]
    }
    stdout{ codec => rubydebug}
}

This one doesn't extract any fields. Even Idofevent is not getting extracted. Why am I not able to extract fields in my windows logs ? Can you please help me?

If you are using xpath then the XML must be valid. The <Events> at the start of the first event means it will get an _xmlparsefailure.

I suggest you use </Events> and then add a split filter. You may need to an an auto_flush_interval to the codec if you are consuming the entire file.

Hey, your suggestion is that I should use <\Events> as a pattern for multiline codec. Right?. But actually the complete file is as follows

<Events>
      <Event>
             <Computer>...</Computer>
             ...
             <EventData> 
                <Data Name = "ErrorCode"> 0 </Data>
                <Data Name = "ipaddress"> 127.0.0.1 </Data>
                <Data Name = "port">800</Data>
                ...
             </EventData>
        </Event>
        <Event>
           ...
       </Event>
<Events>

so we need to start a new line for each <Event>.

I also tried removing <Events> tag and passing just one <Event>, still it didn't work.

Anyways, i'll try using auto_flush_interval, or a split filter

I faced a similar problem parsing some Nessus scans that are in XML format, my solution after countless nights of trying to work with xpath and logstash, was to use a python program to extract al the data, convert it to JSON format and print it to the stdout, sound like a lot, but is easy to implement:

Python Program

import xml.etree.ElementTree as ET
..........
def getProperty(node, propertyName, defaultValue):
	obj = node.find(propertyName)
	if obj is None:
		return defaultValue
	return obj.text

def getTag(node, tagName, defaultValue):
	obj = node.get(tagName)
	if obj is None:
		return defaultValue
	return obj
..........
def parseFile(f):
	root = ET.parse(f)
	for host in root.iter('MainKey'):
		mystruct = mynewstruct()
		mystruct .name = host.get('name')
		for properties in host.iter('mainProperties'):
			for tag in properties:
				if tag.get('name') == 'ip':
					mystruct .ip = tag.text
				if tag.get('name') == 'os':
					mystruct .os = tag.text
..........
		for item in host.iter('OtherItem'):
			if int(item.get('myitem')) <= 0:
				continue
			mystruct .severity = getTag(item, 'myitem', '')
			mystruct .port = getTag(item, 'port', '')
			mystruct .description = getProperty(item, 'description', '')
..........
			j_data = json.dumps(mystruct .__dict__)
			print (j_data)

Logstash Pipeline

input {
    exec {
        command => "python /mycustopath/scripts/parse.py"
		interval => 60
		codec => multiline {
			pattern => "^\n"
			what => "previous"
		}
    }
}

filter {
	if [message] == "()" {
		drop{}
	}
	
	json {
		source => "message"
	}
..........

Hope it Helps!.

Useful References:
https://docs.python.org/3/library/xml.etree.elementtree.html

2 Likes

OK, so that is not valid XML and you will need to modify it before passing it to an xml filter.

1 Like

Wow. That's Great. Thanks a lot. This definitely made my day. Finally I can rest now.

Yeah, now i got it. Thanks. This was a great discussion. I learned a lot.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.