Regular expression problem


(Simon Risberg) #1

Hi!

I'm trying to make a regular expression to get out a certain word from a URIPATH that is not actually in the parh itself. So far I've actually managed to do that although when I put it into my logstash configuration it doesn't like the syntax so it "gracefully" stops ELK from starting up. I know that my pattern is correct because I've tried it with a grok debugger.

Typical event message:10.67.6.51 - - [21/Jun/2015:21:14:21 +0000] "GET /nexus/content/repositories/jts-development/com/jeppesen/jcms/maven-metadata.xml.sha1 HTTP/1.1" 200 40

My expression: "(?[^/]+) /nexus/content/repositories/"

What shows up in the grok debugger: "GET"

How my logstash configuration looks: (it's the last pattern in the grok filter)

grok {

     type => "nexus-log"
     break_on_match => false

     match => [
        "message", "\b\w+\b\s/nexus/content/repositories/(?<repositories>[^/]+)",
        "message", "(?<mytimestamp>%{MONTHDAY}/%{MONTH}/%{YEAR}:%{HOUR}:%{MINUTE}:%{SECOND} %{ISO8601_TIMEZONE})",
        "message", " "(?<requesttype>[^/]+) /nexus/content/repositories/"
      ]
   }

(Magnus Bäck) #2
  • If you want a double quote inside your expression you need to escape it with a backslash. That's most likely why Logstash doesn't start.
  • I can only assume that this expression results in a trailing space at the end of the resulting requesttype field. Why not just use %{WORD:requesttype} to match the HTTP method? They never contain spaces anyway.
  • It would've been way easier to just use the predefined grok pattern for this kind of logfile (it looks like an Apache common file) to get everything into separate fields without any custom expressions at all.

(Simon Risberg) #3

Thank you, I'm gonna try to use the WORD pattern. If that doesn't work, where should I insert the backslash?


(Magnus Bäck) #4

Use the backslash to escape double quotes that occur within the regular expressions. Or, you could make the regular expression single-quoted (i.e. it's delimited by single quotes rather than double quotes).


(Simon Risberg) #5

Thanks for the help. The predefined pattern worked just fine. Although I have a new problem that has risen. When the pattern doesn't succeed in matching anything on certain events which is correct because it shouldn't it still shows some kind of result but it becoems a "-". Is there anyway to get rid of that? I'm guessing it's some kind of grokparsefailure?

Picture below.


(Magnus Bäck) #6

What do those messages look like in full and what's your filter configuration?


(Simon Risberg) #7

The message in full looks like this

My filter configuration looks like this

filter {

   grok {

     type => "nexus-log"
     break_on_match => false

     match => [
        "message", "\b\w+\b\s/nexus/content/repositories/(?<repositories>[^/]+)",
        "message", "(?<mytimestamp>%{MONTHDAY}/%{MONTH}/%{YEAR}:%{HOUR}:%{MINUTE}:%{SECOND} %{ISO8601_TIMEZONE})",
        "message", "(%{WORD:requesttype}) /nexus/content/repositories/"
      ]
   }
   date{
      match => ["mytimestamp", "dd/MMM/YYYY:HH:mm:ss Z" ]
      remove_field => ["mytimestamp"]
   }

}

Note that this is nothing that really needs an urgent fix although it would look nicer.


(Magnus Bäck) #8

I'm not sure exactly how break_on_match affects the addition of the _grokparsefailure tag, but if the tag is added unless all expressions match then that's clearly the reason since /nexus doesn't match /nexus/content/repositories.


(Simon Risberg) #9

I understand. Well I need to have the break_on_match function so I guess I'll just have to live with it.


(Magnus Bäck) #10

No, you don't need break_on_match. You could easily merge all three expressions into a single expression. Or, as mentioned previously, use a generic pattern to do the bulk of the parsing instead of reinventing the wheel.


(Simon Risberg) #11

I can see how I might be able to use a generic pattern on the the pattern "repositories" that I have created but I don't really see it happening on the "mytimestamp" part. I'm not entirely sure how to merged them into a single expression either. Wouldn't that look pretty strange?


(Magnus Bäck) #12

No, why would it be strange? But yes, becuase you're extracting the repositories field from the URI you can't use the predefined grok patterns out of the box but you could certainly use them as a starting point. You're attempting to parse a single line so it makes perfect sense to use a single expression for the parsing.


(system) #13