I have been working on a grok pattern that extracts the component from a URIPATH. I need some help configuring it so it extracts the piece of information that I want.
The message
"GET /nexus/content/repositories/jts-development/com/jeppesen/jcms/xpress.26.01.22/26.01.22/xpress.26.01.22-26.01.22.pom HTTP/1.1"
The pattern I've configured
\b\w+\b\s/nexus/content/repositories/jts-development/com/jeppesen/jcms/(?[^/]+)
What it extracts
xpress.26.01.22
What I want it to extract
com/jeppesen/jcms/xpress.26.01.22/26.01.22/xpress.26.01.22-26.01.22.pom
This has been driving me crazy. Any suggestions?
Please always format configuration snippets as code to avoid HTML stripping that turns e.g. (?<foo>[^/]+)
into (?[^/]+)
.
The results you get are unsurprising; you extract one or more characters until you hit the first forward slash, starting after /nexus/content/repositories/jts-development/com/jeppesen/jcms/. Then you end up with xpress.26.01.22.
I think you should start by parsing that part of the log into three fields,
- method: GET
- uripath: /nexus/content/repositories/jts-development/com/jeppesen/jcms/xpress.26.01.22/26.01.22/xpress.26.01.22-26.01.22.pom
- http_version: HTTP/1.1
Then, parse the uripath
field with an expression like this:
/nexus/content/repositories/jts-development/%{GREEDYDATA:repofile}
If you insist on the approach you've started you can do this:
\b\w+\b\s/nexus/content/repositories/jts-development/%{NOTSPACE:repofile}
Is it possible to only get URIPATHS which has a ".pom" in the end?
I'll assume you're sticking to the first solution of parsing the log into three fields. Sure, it's possible. You could e.g. wrap the second grok in a conditional,
if [uripath] =~ /\.pom$/ {
grok {
match => [
"uripath",
"/nexus/content/repositories/jts-development/%{GREEDYDATA:repofile}"
]
}
}
or unconditionally grok the field but disable the _grokparsefailure
tag:
grok {
match => [
"uripath",
"/nexus/content/repositories/jts-development/(?<repofile>.*\.pom)$"
]
tag_on_failure => []
}
Since presumably not all uripath
values will begin with /nexus/content/repositories/jts-development I'd probably go with the second solution.
Hi!
I tried your second solution but apparently it doesn't make a field out of it. Is there something I have missed?
filter {
grok {
type => "nexus-log"
break_on_match => false
match => [
"message", "\b\w+\b\s/nexus/content/repositories/(?<repositories>[^/]+)",
"message", "(?<mytimestamp>%{MONTHDAY}/%{MONTH}/%{YEAR}:%{HOUR}:%{MINUTE}:%{SECOND} %{ISO8601_TIMEZONE})",
"message", "(%{WORD:requesttype reps}) /nexus/content/repositories/",
"message", "(%{WORD:requesttype groups}) /nexus/content/groups/public/com/jeppesen/jcms/",
"message", "\b\w+\b\s/nexus/content/repositories/jts-development/(?<repofile>.*\.pom)$",
"message", "\b\w+\b\s/nexus/content/groups/public/com/jeppesen/jcms/(?<groups>[^/]+)"
]
}
date{
match => ["mytimestamp", "dd/MMM/YYYY:HH:mm:ss Z" ]
remove_field => ["mytimestamp"]
}
}
I double cheched it with the grok debugger and it produced the result I wanted it to produce. It must be something that Logstash doesn't like.