Logstash XML Filter - xpath won't select element value

I am using Logstash 7.6.1 to ingest an XML file that was generated using a standard Microsoft Windows tool call WinInet Trace. Although I am able to ingest the entire XML, I want to generate some hashes from the values in the XML tree and I've been trying to use xpath.

The data file looks like this:

<Events>
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
	<System>
		<Provider Guid="{9e814aad-3204-11d2-9a82-006008a86939}" />
		<EventID>0</EventID>
		<Version>2</Version>
		<Level>0</Level>
		<Task>0</Task>
		<Opcode>0</Opcode>
		<Keywords>0x0</Keywords>
		<TimeCreated SystemTime="2020-03-27T14:00:48.756779200+00:00" />
		<Correlation ActivityID="{00000000-0000-0000-0000-000000000000}" />
		<Execution ProcessID="7500" ThreadID="7464" ProcessorID="0" KernelTime="90" UserTime="30" />
		<Channel />
		<Computer />
	</System>
	<EventData>
		<Data Name="BufferSize">    8192</Data>
		<Data Name="Version">83951878</Data>
		<Data Name="ProviderVersion">    7601</Data>
		<Data Name="NumberOfProcessors">       2</Data>
		<Data Name="EndTime">132298058270657554</Data>
		<Data Name="TimerResolution">  156001</Data>
		<Data Name="MaxFileSize">       0</Data>
		<Data Name="LogFileMode">0x0</Data>
		<Data Name="BuffersWritten">   17696</Data>
		<Data Name="StartBuffers">       1</Data>
		<Data Name="PointerSize">       8</Data>
		<Data Name="EventsLost">       1</Data>
		<Data Name="CPUSpeed">    2400</Data>
		<Data Name="LoggerName">0x5</Data>
		<Data Name="LogFileName">0x7</Data>
		<Data Name="BootTime">132297485993751998</Data>
		<Data Name="PerfFreq">10000000</Data>
		<Data Name="StartTime">132297912487567792</Data>
		<Data Name="ReservedFlags">0x1</Data>
		<Data Name="BuffersLost">       0</Data>
		<Data Name="SessionNameString">wininettrace</Data>
		<Data Name="LogFileNameString">C:\Temp\wininettrace.etl</Data>
	</EventData>
	<RenderingInfo Culture="en-GB">
		<Opcode>Header</Opcode>
		<Provider>MSNT_SystemTrace</Provider>
		<EventName xmlns="http://schemas.microsoft.com/win/2004/08/events/trace">EventTrace</EventName>
	</RenderingInfo>
	<ExtendedTracingInfo xmlns="http://schemas.microsoft.com/win/2004/08/events/trace">
		<EventGuid>{68fdd900-4a3e-11d1-84f4-0000f80464e3}</EventGuid>
	</ExtendedTracingInfo>
</Event>

The configuration file is this:

input {
  file {
    path => "c:/traces/pcbd/20200327/sample.xml"
    start_position => "beginning"
    sincedb_path => "NUL"
	type => "xml"
	
    codec => multiline {
      pattern => "<Event "
      negate => true
      what => "previous"
      auto_flush_interval => 1
    }
  }
}
filter {
    xml {
      source => "message"
	  target => "wininet"
	  store_xml => true
      xpath => [ "//Event/System/EventID/text()", "System.EventID" ]
    }
}
filter {
  mutate { remove_field => [ "message" ] }
}
output {
    elasticsearch {
      hosts => "localhost"
      index => "pcbd"
    }
    stdout {
      codec => rubydebug
	}
}

The records get ingested with all of the XML content in the JSON, but the System.EventID hash isn't generated.

I spent ages looking for the answer to this, and I'll post the answer next.

The answer is to add remove_namespaces => true to the XML filter giving a config file that looks like this:

input {
  file {
    path => "c:/traces/pcbd/20200327/sample.xml"
    start_position => "beginning"
    sincedb_path => "NUL"
	type => "xml"
	
    codec => multiline {
      pattern => "<Event "
      negate => true
      what => "previous"
      auto_flush_interval => 1
    }
  }
}
filter {
    xml {
	  remove_namespaces => true
          source => "message"
	  target => "wininet"
	  store_xml => true
      xpath => [ "//Event/System/EventID/text()", "System.EventID" ]
    }
}
filter {
  mutate { remove_field => [ "message" ] }
}
output {
    elasticsearch {
      hosts => "localhost"
      index => "pcbd"
    }
    stdout {
      codec => rubydebug
	}
}

When I read the notes for the remove_namespaces parameter I thought this just referred to deleting namespace definitions from tags, e.g. <xmlns:Event. It seems it also removes the xmlns attribute from tags during preprocessing and, without removing it, xpath doesn't work.

2 Likes

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.