Grok!

So I understand grok patterns to be regex queries (Like DNS names for IP addresses). I am trying to build some patterns for the following log entries.

New Node: 212.55.78.183 Issue: Semalt Project  Spam Bot Network: LLC "McLaut-Invest"
Abuse unresolved for 75 days: 195.34.150.18 host:  / Liberty Global
New Node: 54.36.149.71 Issue:   Intelligence Bot Network: Unknown
New Node: 54.36.148.72 Issue:   Intelligence Bot Network: Unknown
New Node: 54.36.148.173 Issue:   Intelligence Bot Network: Unknown
New Node: 221.237.208.10 Issue:  Login Brute Force Bot Network: No.31,Jin-ro
New Node: 46.229.164.102 Issue:   Intelligence Bot Network: ADVANCEDHOS
New Node: 104.238.51.50 Issue: Orphan  Scanner Network: SimpleLink LLC
New Node: 46.118.127.172 Issue: Fake Referrer Log  Bot Network:  / T:…
New Node: 31.184.194.114 Issue: Shellshock Exploiter Bot (shell  Download and

I want to carve these into the following fields:
Reason
IP Address
Issue
Network

I've managed to build two, reason and IP address:

WIBSREASON ((New Node: ?)|Abuse unresolved for [0-9]?[0-9][0-9]? days: ?)
WIBSIP ([0-9][0-9]?[0-9]?\.[0-9][0-9]?[0-9]?\.[0-9][0-9]?[0-9]?\.[0-9][0-9]?[0-9]?)

However, I am having a problem figuring out how to create one for Issue and one for Network, without including the words Issue or Network. The problem I am having is delineating where Issue should stop. Any grok gurus out there that can help me? Also, FYI, the number of spaces between issue and the next work varies between one and three spaces....if that matters any.

Hi Walker,

The way I see it is that you have just two patterns, which need to be parsed.

The first pattern being the following:

and the second as:

You could use the patterns included in GROK to parse this data, however please note that GROK is case sensitive. You should ideally be aware of the number of spaces or write a custom regex patterns to parse multiple spaces. Here is a sample pattern i wrote to parse the data. You can easily reuse the same logic to parse the other data as well.

%{DATA:data}:%{SPACE}%{IP:ipaddr}%{SPACE}%{WORD:type}:%{GREEDYDATA:reason}

You can use the grok constructor included in xpack basic to create this pattern. Or you can use the grok debugger app (https://grokdebug.herokuapp.com).

Here is a list of patterns predefined.
https://github.com/logstash-plugins/logstash-patterns-core/blob/master/patterns/grok-patterns

Regards,
N

I appreciate your time but I have four fields, not two. Using the first event in the OP, I'd like to parse it out to:

FieldName Value
Reason New Node
IP Address 212.55.75.183
Issue Semalt Project Spam Bot
Network LLC "McLaut-Invest"

This then lets me further build out using geoip filtering on the IP address or analysis of the Issue field. I've got a workaround at the moment that involves regex and then mutate's gsub to remove all the crap I don't want...but, as you can see below, it's a lot of extra stuff and seems kinda hacky.

  if [user][screen_name] == "WebironBots" {
    if [extended_tweet][full_text] {
      ruby {
        code => "event.set('event', event.get('[extended_tweet][full_text]').scan(/New Node|Abuse unresolved for [0-9][0-9]?[0-9]? days/i))
                 event.set('ip_address', event.get('[extended_tweet][full_text]').scan(/[0-9][0-9]?[0-9]?\.[0-9][0-9]?[0-9]?\.[0-9][0-9]?[0-9]?\.[0-9][0-9]?[0-9]?/i))
                 event.set('issue', event.get('[extended_tweet][full_text]').scan(/Issue.*/i))
                 event.set('network', event.get('[extended_tweet][full_text]').scan(/Network.*/i))"
      }
  } else {
      ruby {
        code => "event.set('event', event.get('text_original').scan(/New Node|Abuse unresolved for [0-9][0-9]?[0-9]? days/i))
                 event.set('ip_address', event.get('text_original').scan(/[0-9][0-9]?[0-9]?\.[0-9][0-9]?[0-9]?\.[0-9][0-9]?[0-9]?\.[0-9][0-9]?[0-9]?/i))
                 event.set('issue', event.get('text_original').scan(/Issue.*/i))
                 event.set('network', event.get('text_original').scan(/Network.*/i))"
      }
    }
    geoip {
      source => "ip_address"
      fields => [
        "city_name",
        "country_code2",
        "country_code3",
        "country_name",
        "latitude",
        "longitude"
      ]
      tag_on_failure => false
    }
    mutate {
      gsub => [
        "issue", "Issue:\s*|Network.*|https?:\/\/\S* |\#", "",
        "network", "Network:\s*|https?:\/\/\S* |\#", ""
      ]
      add_field => {
        "coordinates" => "%{[geoip][latitude]}, %{[geoip][longitude]}"
      }
    }
  }

Hi Walker,

I realize that, and if you notice my expression written, it does parse out 4 fields in total. I will rewrite to explain it better.

%{DATA:field_no_1}:%{SPACE}%{IP:field_no_2}%{SPACE}%{WORD:filed_no_3}:%{GREEDYDATA:field_no_4}

Hope this helps. Let me know if you need any more help.

Regards,
N

This is what I get for half paying attention facepalm. Thanks for the help NerdSec.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.