Help using Logstash XML Filter to Parse Offline Windows Event Logs

I am trying to ingest Windows Event (Application, Security, System) logs into Logstash that were pulled from an offline Windows system; ELK is running on a separate offline system. The logs were saved in XML format (snippet below). I have tried multiple methods, found both on this Forum and StackOverflow, to parse all of the Windows Event fields (Computer name, IP, EventID, etc), but with each attempt, the individual fields are not being parsed and/or added as Fields in Kibana to filter on.

<?xml version="1.0" encoding="UTF-8" standalone="true"?>
<Events>
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
    <System>
        <Provider Name="Redacted"/>
            <EventID Qualifiers="16384">1704</EventID>
            <Level>4</Level>
            <Task>0</Task>
            <Keywords>0x80000000000000</Keywords>
            <TimeCreated SystemTime="2018-05-02T16:13:47.000000000Z"/>
            <EventRecordID>10052</EventRecordID>
            <Channel>Application</Channel>
            <Computer>COMPUTER.LOCAL</Computer>
            <Security/>
        </System>
    <EventData>
        <Data/>
    </EventData>
</Event>

<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
    <System>
        <Provider Name="Windows Error Reporting"/>
        <EventID Qualifiers="0">1001</EventID>
        <Level>4</Level>
    ..........
</Event>

I take it you are aware of Winlogbeat and have reason not to use it.

Assuming your file has a trailing </Events> to make it valid XML, the following should help.

logstash is really good at reading complete files up to the point where it does not work. For example, some inputs insert an event boundary when they get to 16 KB of input. But if you can avoid those corner cases, this will read the file.

input {
    file {
        path => "/path/to/foo.xml"
        codec => multiline {
           pattern => "<Events" 
           negate => true 
           what => "previous" 
           auto_flush_interval => 2
        }
        start_position => "beginning"
        sincedb_path => "/dev/null"
    }
}

This will parse it

filter { xml { source => "message" store_xml => true target => "theXML" force_array => false } }

giving you something like

{
        "theXML" => {
        "Event" => [
            [0] {
                    "xmlns" => "http://schemas.microsoft.com/win/2004/08/events/event",
                   "System" => {
                          "EventID" => {
                        "Qualifiers" => "16384",
                           "content" => "1704"
                    },
                          "Channel" => "Application",
                         "Provider" => {
                        "Name" => "Redacted"
                    },
                         "Computer" => "COMPUTER.LOCAL",
                      "TimeCreated" => {
                        "SystemTime" => "2018-05-02T16:13:47.000000000Z"
                    },
                            "Level" => "4",
                             "Task" => "0",
                         "Keywords" => "0x80000000000000",
                    "EventRecordID" => "10052"
                },
                "EventData" => nil
            },

Which you might want to chop up using

split { field => "[theXML][Event]" }

Oh, and you might want to add

if [message] =~ /<\?xml/ { drop {} }
1 Like

I know about Winlogbeat and use it on some hosts. But I am unable to install software on this Windows box, so I was provided the exported logs as XML.

You believed correct in that it ended in </Events>. The XML documents range in size from 10MB to at least 400MB which Logstash should not have an issue chomping through.

I had tried to use <Event> rather than <Events> in the pattern field. I also had not yet tried “auto_flush”, “target”, or “force_array” so hopefully those fix it.

I will update once I am back at my computer. Thank you.

That is not going to work if you try to initially fold <Events> (the whole file) into a single logstash event. It might be possible to slightly adjust the logstash config to drop {} lines containing Events, and then do the multiline using <Event. It is possible the use case would be better served by switching to filebeat with a multiline log prospector feeding logstash on a beats input.

I am trying to parse only the 10MB file right now, running with the following configuration file.

input {
  file {
    path => "/Users/Bacon/Desktop/computer_application.xml"
    start_position => "beginning"
    sincedb_path => "/dev/null"
    type => "xml"
    codec => multiline {
      pattern => "<Event \?xml/"
      negate => true
      what => "previous"
      auto_flush_interval => 2
    }
  }
}

filter {
  xml {
    source => "message"
    store_xml => true
    target => "theXML"
    force_array => false
  }
  if [message] =~ /<Event \?xml/ {
    drop {}
    split {
      field => "[Event]"
    }
  }
}

output {
  elasticsearch {
    # codec => json
    hosts => "192.168.1.194:9200"
    index => "aaxmlparsing"
  }
}

It appears to parse out some of the schema information, but it does not create Fields in Kibana based on these.

10: d3081572-e3f0-49f5-8b83-ec763c014570, 1, 0 [(0 [0xC004F014, 0, 0], [(?)(?)(?)(?)(?)(?)(?)(?)])(1 )(2 )]
11: d6992aac-29e7-452a-bf10-bbfb8ccabe59, 1, 0 [(0 [0xC004F014, 0, 0], [(?)(?)(?)(?)(?)(?)(?)(?)])(1 )(2 )]
12: dcb88f6f-b090-405b-850e-dabcccf3693f, 1, 0 [(0 [0xC004F014, 0, 0], [(?)(?)(?)(?)(?)(?)(?)(?)])(1 )(2 )]
13: f002931d-5536-4908-8d93-40ae584e24d6, 1, 0 [(0 [0xC004F014, 0, 0], [(?)(?)(?)(?)(?)(?)(?)(?)])(1 )(2 )]
14: 9d0bb49b-21a1-4354-9981-ec5dd9393961, 1, 0 [(0 [0xC004F014, 0, 0], [(?)(?)(?)(?)(?)(?)(?)(?)])(1 )(2 )]

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.