Regular expression problem

simonrisberg · July 29, 2015, 1:09pm

Hi!

I'm trying to make a regular expression to get out a certain word from a URIPATH that is not actually in the parh itself. So far I've actually managed to do that although when I put it into my logstash configuration it doesn't like the syntax so it "gracefully" stops ELK from starting up. I know that my pattern is correct because I've tried it with a grok debugger.

Typical event message:10.67.6.51 - - [21/Jun/2015:21:14:21 +0000] "GET /nexus/content/repositories/jts-development/com/jeppesen/jcms/maven-metadata.xml.sha1 HTTP/1.1" 200 40

My expression: "(?[^/]+) /nexus/content/repositories/"

What shows up in the grok debugger: "GET"

How my logstash configuration looks: (it's the last pattern in the grok filter)

grok {

     type => "nexus-log"
     break_on_match => false

     match => [
        "message", "\b\w+\b\s/nexus/content/repositories/(?<repositories>[^/]+)",
        "message", "(?<mytimestamp>%{MONTHDAY}/%{MONTH}/%{YEAR}:%{HOUR}:%{MINUTE}:%{SECOND} %{ISO8601_TIMEZONE})",
        "message", " "(?<requesttype>[^/]+) /nexus/content/repositories/"
      ]
   }

magnusbaeck · July 29, 2015, 3:14pm

If you want a double quote inside your expression you need to escape it with a backslash. That's most likely why Logstash doesn't start.
I can only assume that this expression results in a trailing space at the end of the resulting requesttype field. Why not just use %{WORD:requesttype} to match the HTTP method? They never contain spaces anyway.
It would've been way easier to just use the predefined grok pattern for this kind of logfile (it looks like an Apache common file) to get everything into separate fields without any custom expressions at all.

simonrisberg · July 30, 2015, 7:31am

Thank you, I'm gonna try to use the WORD pattern. If that doesn't work, where should I insert the backslash?

magnusbaeck · July 30, 2015, 7:33am

Use the backslash to escape double quotes that occur within the regular expressions. Or, you could make the regular expression single-quoted (i.e. it's delimited by single quotes rather than double quotes).

simonrisberg · July 30, 2015, 8:14am

Thanks for the help. The predefined pattern worked just fine. Although I have a new problem that has risen. When the pattern doesn't succeed in matching anything on certain events which is correct because it shouldn't it still shows some kind of result but it becoems a "-". Is there anyway to get rid of that? I'm guessing it's some kind of grokparsefailure?

Picture below.

magnusbaeck · July 30, 2015, 1:15pm

What do those messages look like in full and what's your filter configuration?

simonrisberg · July 30, 2015, 1:39pm

The message in full looks like this

My filter configuration looks like this

filter {

   grok {

     type => "nexus-log"
     break_on_match => false

     match => [
        "message", "\b\w+\b\s/nexus/content/repositories/(?<repositories>[^/]+)",
        "message", "(?<mytimestamp>%{MONTHDAY}/%{MONTH}/%{YEAR}:%{HOUR}:%{MINUTE}:%{SECOND} %{ISO8601_TIMEZONE})",
        "message", "(%{WORD:requesttype}) /nexus/content/repositories/"
      ]
   }
   date{
      match => ["mytimestamp", "dd/MMM/YYYY:HH:mm:ss Z" ]
      remove_field => ["mytimestamp"]
   }

}

Note that this is nothing that really needs an urgent fix although it would look nicer.

magnusbaeck · July 30, 2015, 2:39pm

I'm not sure exactly how break_on_match affects the addition of the _grokparsefailure tag, but if the tag is added unless all expressions match then that's clearly the reason since /nexus doesn't match /nexus/content/repositories.

simonrisberg · July 30, 2015, 2:45pm

I understand. Well I need to have the break_on_match function so I guess I'll just have to live with it.

magnusbaeck · July 30, 2015, 3:55pm

No, you don't need break_on_match. You could easily merge all three expressions into a single expression. Or, as mentioned previously, use a generic pattern to do the bulk of the parsing instead of reinventing the wheel.

simonrisberg · July 31, 2015, 9:33am

I can see how I might be able to use a generic pattern on the the pattern "repositories" that I have created but I don't really see it happening on the "mytimestamp" part. I'm not entirely sure how to merged them into a single expression either. Wouldn't that look pretty strange?

magnusbaeck · July 31, 2015, 6:07pm

No, why would it be strange? But yes, becuase you're extracting the repositories field from the URI you can't use the predefined grok patterns out of the box but you could certainly use them as a starting point. You're attempting to parse a single line so it makes perfect sense to use a single expression for the parsing.

Topic		Replies	Views
Take out bits of a URIPATH in Logstash Logstash	21	4937	July 6, 2017
Help with custom pattern Logstash	2	211	September 18, 2020
No Matches for the grok filter Logstash	6	946	April 5, 2018
Logstash parsing problem Logstash	5	397	February 6, 2020
_grokparsefailure in the file Logstash	6	949	August 30, 2017

Regular expression problem

Related topics