Help with a grok pattern

I have been working on a grok pattern that extracts the component from a URIPATH. I need some help configuring it so it extracts the piece of information that I want.

The message

"GET /nexus/content/repositories/jts-development/com/jeppesen/jcms/xpress.26.01.22/26.01.22/xpress.26.01.22-26.01.22.pom HTTP/1.1"

The pattern I've configured

\b\w+\b\s/nexus/content/repositories/jts-development/com/jeppesen/jcms/(?[^/]+)

What it extracts

xpress.26.01.22

What I want it to extract

com/jeppesen/jcms/xpress.26.01.22/26.01.22/xpress.26.01.22-26.01.22.pom

This has been driving me crazy. Any suggestions?

Please always format configuration snippets as code to avoid HTML stripping that turns e.g. (?<foo>[^/]+) into (?[^/]+).

The results you get are unsurprising; you extract one or more characters until you hit the first forward slash, starting after /nexus/content/repositories/jts-development/com/jeppesen/jcms/. Then you end up with xpress.26.01.22.

I think you should start by parsing that part of the log into three fields,

  • method: GET
  • uripath: /nexus/content/repositories/jts-development/com/jeppesen/jcms/xpress.26.01.22/26.01.22/xpress.26.01.22-26.01.22.pom
  • http_version: HTTP/1.1

Then, parse the uripath field with an expression like this:

/nexus/content/repositories/jts-development/%{GREEDYDATA:repofile}

If you insist on the approach you've started you can do this:

\b\w+\b\s/nexus/content/repositories/jts-development/%{NOTSPACE:repofile}

Is it possible to only get URIPATHS which has a ".pom" in the end?

I'll assume you're sticking to the first solution of parsing the log into three fields. Sure, it's possible. You could e.g. wrap the second grok in a conditional,

if [uripath] =~ /\.pom$/ {
  grok {
    match => [
      "uripath",
      "/nexus/content/repositories/jts-development/%{GREEDYDATA:repofile}"
    ]
  }
}

or unconditionally grok the field but disable the _grokparsefailure tag:

grok {
  match => [
    "uripath",
    "/nexus/content/repositories/jts-development/(?<repofile>.*\.pom)$"
  ]
  tag_on_failure => []
}

Since presumably not all uripath values will begin with /nexus/content/repositories/jts-development I'd probably go with the second solution.

Hi!

I tried your second solution but apparently it doesn't make a field out of it. Is there something I have missed?

filter {

   grok {

     type => "nexus-log"
     break_on_match => false

     match => [
        "message", "\b\w+\b\s/nexus/content/repositories/(?<repositories>[^/]+)",
        "message", "(?<mytimestamp>%{MONTHDAY}/%{MONTH}/%{YEAR}:%{HOUR}:%{MINUTE}:%{SECOND} %{ISO8601_TIMEZONE})",
        "message", "(%{WORD:requesttype reps}) /nexus/content/repositories/",
        "message", "(%{WORD:requesttype groups}) /nexus/content/groups/public/com/jeppesen/jcms/",
        "message", "\b\w+\b\s/nexus/content/repositories/jts-development/(?<repofile>.*\.pom)$",
        "message", "\b\w+\b\s/nexus/content/groups/public/com/jeppesen/jcms/(?<groups>[^/]+)"
      ]
   }
   date{
      match => ["mytimestamp", "dd/MMM/YYYY:HH:mm:ss Z" ]
      remove_field => ["mytimestamp"]
   }

}

I double cheched it with the grok debugger and it produced the result I wanted it to produce. It must be something that Logstash doesn't like.