Trouble getting XML into usable format


#1

Hello, I am not sure if this would be better for Logstash or Beats, but I am using filebeat to send an XML file of windows events. Some of the data looks neat and can be pretty easy to implement a KV with a value split of colon, but my trouble seems to be the data with the <*> and </> because I am not sure how to go about separating that. Here is a sample of the log I see in Kibana:

Subject: Security ID: S-1-5-21-2341602717-1282933724-2322446451-1007 
Account Name: TestUser 
Account Domain: TestDomain 
Logon ID: 0x22e0725e4 
Logon Type: 2 

This event is generated when a logon session is destroyed. It may be positively correlated with a logon event using the Logon ID value. Logon IDs are only unique between reboots on the same computer.</Message><Level>Information</Level><Task>Logoff</Task><Opcode>Info</Opcode><Channel>Security</Channel><Provider>Microsoft Windows security auditing.</Provider><Keywords><Keyword>Audit Success</Keyword></Keywords></RenderingInfo></Event><Event xmlns='http://schemas.microsoft.com/win/2004/08/events/event'><System><Provider Name='Microsoft-Windows-Security-Auditing' Guid='{54849625-5478-4994-A5BA-3E3B0328C30D}'/><EventID>4625</EventID><Version>0</Version><Level>0</Level><Task>12544</Task><Opcode>0</Opcode><Keywords>0x8010000000000000</Keywords><TimeCreated SystemTime='2018-10-08T06:18:48.497302000Z'/><EventRecordID>9681517</EventRecordID><Correlation/><Execution ProcessID='612' ThreadID='4640'/><Channel>Security</Channel><Computer>test.networksrv.com</Computer><Security/></System><EventData><Data Name='SubjectUserSid'>S-1-0-0</Data><Data Name='SubjectUserName'>-</Data><Data Name='SubjectDomainName'>-</Data><Data Name='SubjectLogonId'>0x0</Data><Data Name='TargetUserSid'>S-1-0-0</Data><Data Name='TargetUserName'>admin</Data><Data Name='TargetDomainName'></Data><Data Name='Status'>0xc000006d</Data><Data Name='FailureReason'>%%2313</Data><Data Name='SubStatus'>0xc0000064</Data><Data Name='LogonType'>3</Data><Data Name='LogonProcessName'>NtLmSsp </Data><Data Name='AuthenticationPackageName'>NTLM</Data><Data Name='WorkstationName'></Data><Data Name='TransmittedServices'>-</Data><Data Name='LmPackageName'>-</Data><Data Name='KeyLength'>0</Data><Data Name='ProcessId'>0x0</Data><Data Name='ProcessName'>-</Data><Data Name='IpAddress'>-</Data><Data Name='IpPort'>-</Data></EventData><RenderingInfo Culture='en-US'><Message>An account failed to log on.

Thanks for your time


(Walker) #2

Your best bet is probably to use the XML filter and its xpath function. We'd be better able to help you if we see the raw data before it is processed by Elastic Stack. What you've given us doesn't really help.


#3

hope this example helps you,


#4

I cut down the xml file for ease of testing, but I get the same results with the full file. Here is a trimmed version of the xml file.

> <?xml version="1.0" encoding="utf-8" standalone="yes"?>
> <Events><Event xmlns='http://schemas.microsoft.com/win/2004/08/events/event'><System><Provider Name='Microsoft-Windows-Security-Auditing' Guid='{54849625-5478-4994-A5BA-3E3B0328C30D}'/><EventID>4673</EventID><Version>0</Version><Level>0</Level></System></Event></Events>

(Walker) #5

Im sorry, I just noticed the XML data at the end of the example you provided in your first post. Where is the data coming in that's formatted like in your original post example, is it a text file? Is the raw data formatted like your original example or like the example you just provided? The solution depends how Logstash receives the data.


#6

The xml data is in a text file. It is an exported file from Windows Events. The original file contains a lot more data, but I removed everything except for one entry for testing.

The raw data looks just like what I posted, but it has several lines like
> <Event ...>
> <System>...


(Walker) #7

Ok, what's your input config look like?


#8

I really don't have anything at the moment. As stated at the top, some of the data shows like this:

Subject: Security ID: S-1-5-21-2341602717-1282933724-2322446451-1007 
Account Name: TestUser 
Account Domain: TestDomain 
Logon ID: 0x22e0725e4 
Logon Type: 2

This event is generated when a logon session is destroyed. It may be positively correlated with a logon event using the Logon ID value. Logon IDs are only unique between reboots on the same computer.</Message><Level>Information</Level><Task>Logoff</Task><Opcode>Info</Opcode><Channel>Security</Channel><Provider>Microsoft Windows security auditing.</Provider><Keywords><Keyword>Audit Success</Keyword></Keywords></RenderingInfo></Event><Event xmlns='http://schemas.microsoft.com/win/2004/08/events/event'><System><Provider Name='Microsoft-Windows-Security-Auditing' Guid='{54849625-5478-4994-A5BA-3E3B0328C30D}'/><EventID>4625</EventID><Version>0</Version><Level>0</Level><Task>12544</Task><Opcode>0</Opcode><Keywords>0x8010000000000000</Keywords><TimeCreated SystemTime='2018-10-08T06:18:48.497302000Z'/><EventRecordID>9681517</EventRecordID><Correlation/><Execution ProcessID='612' ThreadID='4640'/><Channel>Security</Channel><Computer>test.networksrv.com</Computer><Security/></System><EventData><Data Name='SubjectUserSid'>S-1-0-0</Data><Data Name='SubjectUserName'>-</Data><Data Name='SubjectDomainName'>-</Data><Data Name='SubjectLogonId'>0x0</Data><Data Name='TargetUserSid'>S-1-0-0</Data><Data Name='TargetUserName'>admin</Data><Data Name='TargetDomainName'></Data><Data Name='Status'>0xc000006d</Data><Data Name='FailureReason'>%%2313</Data><Data Name='SubStatus'>0xc0000064</Data><Data Name='LogonType'>3</Data><Data Name='LogonProcessName'>NtLmSsp </Data><Data Name='AuthenticationPackageName'>NTLM</Data><Data Name='WorkstationName'></Data><Data Name='TransmittedServices'>-</Data><Data Name='LmPackageName'>-</Data><Data Name='KeyLength'>0</Data><Data Name='ProcessId'>0x0</Data><Data Name='ProcessName'>-</Data><Data Name='IpAddress'>-</Data><Data Name='IpPort'>-</Data></EventData><RenderingInfo Culture='en-US'><Message>An account failed to log on.

I know I can create a KV for the data with a colon, but I have no idea how to go about separating the data in the XML format for fields and values, like "Level: Information", "Task: Logoff" etc.


(Walker) #9

First off, do you really need to ingest the text files or can you install winlogbeat and let it do the event ingestation?

If you really do need to ingest .txt files, it looks like there's some missing tags for the XML data in your example, there's no closing tag for the Message at the bottom. Regardless, you'll need to setup the file input plugin with the multiline codec and then configure an IF statement with the drop filter to remove the non-XML data (It looks like it's replicated data from the XML anyways). Next, you'll have to use the mutate filter to remove all the text before the XML data. Finally, you'll use the XML filter to parse the XML data into separate fields.

The below will get you started, the multiline pattern, and regex patterns should work for all your events, assuming there aren't other XML tags that were removed from your example.

input {
  file {
  #Add file input configs...
    codec => multiline {
      pattern => "<Level>"
      negate => true
      what => "previous"
    }
  }
}
filter {
  if [message] !~ "^<Level.*" {
    drop { }
  }
  mutate {
    gsub => [
      "message", "^.*<\/Message>", ""
    ]
  }
  xml {
  #Add xml filter configs...
  }
}