Grok to extract info from non KV formatted data?

Ok.. I'm back with another head scratcher..
I'm so damn new... but I'm learning.:smiley:

I'm sending my entire message through kv{} to have it break out the KV pairs for me..
but I need to make a KV pair for items listed that aren't logged with "this=that"
For instance, the value I want to harvest is from our ModSecurity log entry..
I would like to be able to analyze the triggered rule ID's i.e. [id "981203"]
What Grok would I use to have logstash seek out and create a KV pair for this bit of info?
Something like owasp_rule = "981203" or whatever it should be..
The kv{} filter is doing a great job on the this="that" type data but I need to be able to pick the informational stuff out too.. like id, msg, line, rev, accuracy...

The data coming into Logstash looks like this:

[Fri Aug 30 09:19:31.035231 2019] [security2:error] [pid 3064:tid 140334373766912] [client x.x.x.x:45747] [client x.x.x.x] ModSecurity: Warning. String match "bytes=0-" at REQUEST_HEADERS:Range. [file "/content/waf/2.7.3/modsecurity_crs_protocol_violations.conf"] [line "427"] [id "958291"] [rev "2"] [msg "Range: field exists and begins with 0."] [data "bytes=0-"] [severity "WARNING"] [ver "OWASP_CRS/2.2.7"] [maturity "6"] [accuracy "8"] [tag "OWASP_CRS/PROTOCOL_VIOLATION/INVALID_HREQ"] [hostname ""] [uri "/path-to-resource"] [unique_id "XWkiY38AAAEAAAv4@34AAAAM"], referer:
[Fri Aug 30 09:19:31.016044 2019] timestamp="1567171171" srcip="x.x.x.x" localip="y.y.y.y" user="-" host="z.z.z.z" method="GET" statuscode="200" reason="-" extra="-" exceptions="-" duration="30788" url="/default.aspx" server="" referer="" cookie="ASP.NET_SessionId=tazkoispw5xrymn2vjz5uung; SecurityToken=SecurityTokenID=4ds3545-n34-4hfc-2dhy-4kgju76s458c&Issued=8/30/2019 1:18:38 PM&Expires=8/30/2019 11:18:38 PM; UserInfo=UserID=0&UserID=0&TAID=0&Username=jSchmoe&FullName=Joe Schmoe&" set-cookie="-" recvbytes="957" sentbytes="17206" protocol="HTTP/1.1" ctype="text/html" uagent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.142 Safari/537.36" querystring="" websocket_scheme="-" websocket_protocol="-" websocket_key="-" websocket_version="-" ruleid="31"

Thanks in advance for any insight you guys can offer a newb.

Please?.. lol

You need to strip off the parts that do not match but a kv filter can handle that ModSecurity line

    mutate { gsub => [ "message", ".*ModSecurity: [^\[]+\[", "" ] }
    mutate { gsub => [ "message", "][^\[]+$", "" ] }
    kv { field_split_pattern => "] \[" value_split => " " }

will get you

       "rev" => "2",
       "msg" => "Range: field exists and begins with 0.",
        "id" => "958291",
 "unique_id" => "XWkiY38AAAEAAAv4@34AAAAM",
      "line" => "427",
       "ver" => "OWASP_CRS/2.2.7",
      "data" => "bytes=0-",
  "severity" => "WARNING",
  "maturity" => "6",
  "hostname" => "",
      "file" => "/content/waf/2.7.3/modsecurity_crs_protocol_violations.conf",
       "uri" => "/path-to-resource"

Badger.. you have been really helpful.. Thanks for offering me the info.. I do appreciate it..

So would i add a secodary instance of the KV filter in my .conf? or do I inline this new tranformation with the other one you suggested.. Which is working great thanks!
filter {
mutate { gsub => [ "message", ".*ModSecurity: [^+[", "" ] }
mutate { gsub => [ "message", "][^+$", "" ] }
kv { field_split_pattern => "] [" value_split => " " }
dissect { mapping => { "message" => "[%{[@metadata][timestamp]}]%{}" } }
date { match => [ "[@metadata][timestamp]", "EEE MMM dd HH:mm:ss.SSSSSS yyyy" ] }

or this would be a argument for an IF statement.. because the ModSecurity info isn't always there.

Yes, you should use if so that you only apply the kv filter to lines that you expect to match it.

Ok. Sounds good, I'll try to cobble something together today.
You've been a great help!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.