Hi!
I'm trying to make a regular expression to get out a certain word from a URIPATH that is not actually in the parh itself. So far I've actually managed to do that although when I put it into my logstash configuration it doesn't like the syntax so it "gracefully" stops ELK from starting up. I know that my pattern is correct because I've tried it with a grok debugger.
Typical event message:10.67.6.51 - - [21/Jun/2015:21:14:21 +0000] "GET /nexus/content/repositories/jts-development/com/jeppesen/jcms/maven-metadata.xml.sha1 HTTP/1.1" 200 40
My expression: "(?[^/]+) /nexus/content/repositories/"
What shows up in the grok debugger: "GET"
How my logstash configuration looks: (it's the last pattern in the grok filter)
grok {
type => "nexus-log"
break_on_match => false
match => [
"message", "\b\w+\b\s/nexus/content/repositories/(?<repositories>[^/]+)",
"message", "(?<mytimestamp>%{MONTHDAY}/%{MONTH}/%{YEAR}:%{HOUR}:%{MINUTE}:%{SECOND} %{ISO8601_TIMEZONE})",
"message", " "(?<requesttype>[^/]+) /nexus/content/repositories/"
]
}
Thank you, I'm gonna try to use the WORD pattern. If that doesn't work, where should I insert the backslash?
Use the backslash to escape double quotes that occur within the regular expressions. Or, you could make the regular expression single-quoted (i.e. it's delimited by single quotes rather than double quotes).
Thanks for the help. The predefined pattern worked just fine. Although I have a new problem that has risen. When the pattern doesn't succeed in matching anything on certain events which is correct because it shouldn't it still shows some kind of result but it becoems a "-". Is there anyway to get rid of that? I'm guessing it's some kind of grokparsefailure?
Picture below.
What do those messages look like in full and what's your filter configuration?
The message in full looks like this
My filter configuration looks like this
filter {
grok {
type => "nexus-log"
break_on_match => false
match => [
"message", "\b\w+\b\s/nexus/content/repositories/(?<repositories>[^/]+)",
"message", "(?<mytimestamp>%{MONTHDAY}/%{MONTH}/%{YEAR}:%{HOUR}:%{MINUTE}:%{SECOND} %{ISO8601_TIMEZONE})",
"message", "(%{WORD:requesttype}) /nexus/content/repositories/"
]
}
date{
match => ["mytimestamp", "dd/MMM/YYYY:HH:mm:ss Z" ]
remove_field => ["mytimestamp"]
}
}
Note that this is nothing that really needs an urgent fix although it would look nicer.
I'm not sure exactly how break_on_match
affects the addition of the _grokparsefailure
tag, but if the tag is added unless all expressions match then that's clearly the reason since /nexus doesn't match /nexus/content/repositories.
I understand. Well I need to have the break_on_match function so I guess I'll just have to live with it.
No, you don't need break_on_match
. You could easily merge all three expressions into a single expression. Or, as mentioned previously, use a generic pattern to do the bulk of the parsing instead of reinventing the wheel.
I can see how I might be able to use a generic pattern on the the pattern "repositories" that I have created but I don't really see it happening on the "mytimestamp" part. I'm not entirely sure how to merged them into a single expression either. Wouldn't that look pretty strange?
No, why would it be strange? But yes, becuase you're extracting the repositories
field from the URI you can't use the predefined grok patterns out of the box but you could certainly use them as a starting point. You're attempting to parse a single line so it makes perfect sense to use a single expression for the parsing.