How to parse dir command output from txt file and append filenames to directory


I am trying to parse dir command txt files that are formatted as follows:

 Directory of C:\Windows\addins

09/15/2018  02:33 AM    <DIR>          .
09/15/2018  02:33 AM    <DIR>          ..
09/15/2018  02:29 AM               802 FXSEXT.ecf
               1 File(s)            802 bytes

 Directory of C:\Windows\ADFS

04/30/2021  09:27 AM    <DIR>          .
04/30/2021  09:27 AM    <DIR>          ..
04/30/2021  09:27 AM    <DIR>          en
04/30/2021  09:24 AM            40,960 Microsoft.IdentityServer.Deployment.Core.dll
               1 File(s)         40,960 bytes

 Directory of C:\Windows\ADFS\en

04/30/2021  09:27 AM    <DIR>          .
04/30/2021  09:27 AM    <DIR>          ..
04/30/2021  09:25 AM             6,144 Microsoft.IdentityServer.Deployment.Core.Resources.dll
               1 File(s)          6,144 bytes

 Directory of C:\Windows\appcompat

09/15/2018  02:33 AM    <DIR>          .
09/15/2018  02:33 AM    <DIR>          ..
09/15/2018  02:33 AM    <DIR>          appraiser
09/15/2018  02:33 AM    <DIR>          Programs
02/28/2021  08:08 PM    <DIR>          UA
               0 File(s)              0 bytes

What I am trying to do is parse this unstructured dynamic data and append the filenames to the directory it falls under and put the output in their own event. For example the first directory above is C:\Windows\addins but i want to add the FXSEXT.ecf file to it and any other file that may exist (other than the output ) and append the directory to it and create its own event so this would turn into an event like this:


along with the creation time for that file only and the file size.

the filenames can be none to a plethora and but only trying to get the files and not directorys under as the dir command walks the entire filesystem so eventually the directory is in the output somewhere later in the data.

I have tried the multiline codec and I have tried to grok the data which worked but doesnt not help me append the filepath as it has been written already to a prior event. Im not sure where to go on this and would appreciate any help. Thank you in advance for any help.

I think is better to work in the script to get the filenames with the full path and all the other attributes; Python, Powershell or even bash could get the results nice and formatted (In a csv output if you like) for you to parse in logstash. Try this for example:

Get-ChildItem "." -file | Select-Object FullName, Length, CreationTime

That was my initial thoughts as well, my only worry with using powershell would be any older systems prior to the implementation of powershell or servers that had it optional that may not have it installed I would lose data but I could actually implement a check in my script for powershell and use this method but i think (not totally sure) that i may still run into older systems that this might not work on. I had written a script for forensic analysis with powershell that breaks on some server 2008 due to similar issues whereas the dir command should have shipped with windows for sometime. So whatever option would have to work with any windows system that may be in production.

You could stash the directory name in a ruby class variable (so that it can be accessed from both ruby filter instances)

    if [message] =~ "^$|<DIR>| File\(s\)| Dir\(s\)" { drop {} }
    if "Directory of" in [message] {
        grok { match => { "message" => "Directory of %{GREEDYDATA:dirName}" } }
        ruby { code => '@@dirName = event.get("dirName")' }
        drop {}
    grok { match => { "message" => "^(?<[@metadata][timestamp]>\d{2}/\d{2}/\d{4}  \d{2}:\d{2} (AM|PM))%{SPACE}%{NOTSPACE:filesize} (?<[@metadata][filename]>[^\n]*)" } }
    mutate { convert => { "filesize" => "integer" } }
    date { match => [ "[@metadata][timestamp]", "MM/dd/yyyy  hh:mm aa" ] }
    ruby { code => 'event.set("filename", @@dirName + "\\" + event.get("[@metadata][filename]"))' }

which will produce things like

  "filesize" => 40960,
  "filename" => "C:\\Windows\\ADFS\\Microsoft.IdentityServer.Deployment.Core.dll",
   "message" => "04/30/2021  09:24 AM            40,960 Microsoft.IdentityServer.Deployment.Core.dll",
"@timestamp" => 2021-04-30T13:24:00.000Z

Not sure if you want to use @timestamp for the timestamp.

You will need pipeline.workers set to 1 and pipeline.ordered set to auto (the default in 7.0) or true.

Thank you so much ive tried for a couple of days to figure this out I just didnt realize ruby would retain the variable after the first event write. Thank you so much that is awesome!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.