Logstash XML Filter - xpath won't select element value

I am using Logstash 7.6.1 to ingest an XML file that was generated using a standard Microsoft Windows tool call WinInet Trace. Although I am able to ingest the entire XML, I want to generate some hashes from the values in the XML tree and I've been trying to use xpath.

The data file looks like this:

<Events>
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
	<System>
		<Provider Guid="{9e814aad-3204-11d2-9a82-006008a86939}" />
		<EventID>0</EventID>
		<Version>2</Version>
		<Level>0</Level>
		<Task>0</Task>
		<Opcode>0</Opcode>
		<Keywords>0x0</Keywords>
		<TimeCreated SystemTime="2020-03-27T14:00:48.756779200+00:00" />
		<Correlation ActivityID="{00000000-0000-0000-0000-000000000000}" />
		<Execution ProcessID="7500" ThreadID="7464" ProcessorID="0" KernelTime="90" UserTime="30" />
		<Channel />
		<Computer />
	</System>
	<EventData>
		<Data Name="BufferSize">    8192</Data>
		<Data Name="Version">83951878</Data>
		<Data Name="ProviderVersion">    7601</Data>
		<Data Name="NumberOfProcessors">       2</Data>
		<Data Name="EndTime">132298058270657554</Data>
		<Data Name="TimerResolution">  156001</Data>
		<Data Name="MaxFileSize">       0</Data>
		<Data Name="LogFileMode">0x0</Data>
		<Data Name="BuffersWritten">   17696</Data>
		<Data Name="StartBuffers">       1</Data>
		<Data Name="PointerSize">       8</Data>
		<Data Name="EventsLost">       1</Data>
		<Data Name="CPUSpeed">    2400</Data>
		<Data Name="LoggerName">0x5</Data>
		<Data Name="LogFileName">0x7</Data>
		<Data Name="BootTime">132297485993751998</Data>
		<Data Name="PerfFreq">10000000</Data>
		<Data Name="StartTime">132297912487567792</Data>
		<Data Name="ReservedFlags">0x1</Data>
		<Data Name="BuffersLost">       0</Data>
		<Data Name="SessionNameString">wininettrace</Data>
		<Data Name="LogFileNameString">C:\Temp\wininettrace.etl</Data>
	</EventData>
	<RenderingInfo Culture="en-GB">
		<Opcode>Header</Opcode>
		<Provider>MSNT_SystemTrace</Provider>
		<EventName xmlns="http://schemas.microsoft.com/win/2004/08/events/trace">EventTrace</EventName>
	</RenderingInfo>
	<ExtendedTracingInfo xmlns="http://schemas.microsoft.com/win/2004/08/events/trace">
		<EventGuid>{68fdd900-4a3e-11d1-84f4-0000f80464e3}</EventGuid>
	</ExtendedTracingInfo>
</Event>

The configuration file is this:

input {
  file {
    path => "c:/traces/pcbd/20200327/sample.xml"
    start_position => "beginning"
    sincedb_path => "NUL"
	type => "xml"
	
    codec => multiline {
      pattern => "<Event "
      negate => true
      what => "previous"
      auto_flush_interval => 1
    }
  }
}
filter {
    xml {
      source => "message"
	  target => "wininet"
	  store_xml => true
      xpath => [ "//Event/System/EventID/text()", "System.EventID" ]
    }
}
filter {
  mutate { remove_field => [ "message" ] }
}
output {
    elasticsearch {
      hosts => "localhost"
      index => "pcbd"
    }
    stdout {
      codec => rubydebug
	}
}

The records get ingested with all of the XML content in the JSON, but the System.EventID hash isn't generated.

I spent ages looking for the answer to this, and I'll post the answer next.

The answer is to add remove_namespaces => true to the XML filter giving a config file that looks like this:

input {
  file {
    path => "c:/traces/pcbd/20200327/sample.xml"
    start_position => "beginning"
    sincedb_path => "NUL"
	type => "xml"
	
    codec => multiline {
      pattern => "<Event "
      negate => true
      what => "previous"
      auto_flush_interval => 1
    }
  }
}
filter {
    xml {
	  remove_namespaces => true
          source => "message"
	  target => "wininet"
	  store_xml => true
      xpath => [ "//Event/System/EventID/text()", "System.EventID" ]
    }
}
filter {
  mutate { remove_field => [ "message" ] }
}
output {
    elasticsearch {
      hosts => "localhost"
      index => "pcbd"
    }
    stdout {
      codec => rubydebug
	}
}

When I read the notes for the remove_namespaces parameter I thought this just referred to deleting namespace definitions from tags, e.g. <xmlns:Event. It seems it also removes the xmlns attribute from tags during preprocessing and, without removing it, xpath doesn't work.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.