Take out bits of a URIPATH in Logstash

Hi!

My name is Simon and I'm pretty new to ELK. I have been working with it for a few weeks now and I'm starting to understand how everything works (I think?). Back to the point, I am right now using Logstash to read from a nexus log file which shows which repositories are being updated. The problem is that the log spits out a message with a whole path to the specific repository. I want to make a custom grok filter that only extracts the specific repository.

There is a pattern that follows which is that the name "repositories" always comes before the specific repository name like "nexus/blablabla/repositories/monkey"

I want to extract the "monkey" part so I can get useful data out of it when I'm visualizing it in Kibana 4. So far I've managed to create a custom grok filter that gets it down to "repositories/monkey" but that's it.

Any help would be greatly appreciated.

Best Regards

Simon

So, given nexus/blablabla/repositories/monkey you want to extract "money"? This should work:

grok {
  match => ["uri", "/repositories/(?<repo>[^/]+)"]
}

Thank you. I will try this! Can I just write the regular expression directly into the logstash config file without using a grok pattern? I didn't know that.

Hello again. I tried putting in your pattern. Unfortuneately I got the same result as I did with the old one I created which was "repositories/my-repository-name". Now my field also shows up under "missing fields". What does this mean?

Best regards.

Can I just write the regular expression directly into the logstash config file without using a grok pattern? I didn't know that.

Grok is just a convenience layer on top of plain old regular expressions.

I tried putting in your pattern. Unfortuneately I got the same result as I did with the old one I created which was "repositories/my-repository-name".

What, exactly, did you try? The following minimal example works as expected:

$ cat test.config 
input { stdin { codec => plain } }
output { stdout { codec => rubydebug } }
filter {
  grok {
    match => ["message", "/repositories/(?<repo>[^/]+)"]
  }
}
$ cat data
/foo/bar/repositories/baz
$ /opt/logstash/bin/logstash -f test.config < data
{
       "message" => "/foo/bar/repositories/baz",
      "@version" => "1",
    "@timestamp" => "2015-06-30T07:30:12.942Z",
          "host" => "redacted",
          "repo" => "baz"
}

Now my field also shows up under "missing fields". What does this mean?

Where are you getting this?

This is what my logstash config looks like right now. I'm using a custom grok pattern called NEXUSREP that I've created. I have it in a directory called "patterns" which is located in the same folder as the logstash config. The one you see below is the filter part of my logstash config. I get the "missing fields" part in Kibana 4 because I can't locate it as a field on the left side.

filter {

grok {

 type => "nexus-log"
 patterns_dir => "./config-dir/patterns"
 match => ["message", "%{NEXUSREP:repository}"]

}

}

I tried to put the regular expression directly into my "match" path after the message part instead of referring to my custom grok filter. I see that you are jumping into two different file here. I only have the logstash config file to go after.

What's your definition of NEXUSREP?

I tried to put the regular expression directly into my "match" path after the message part instead of referring to my custom grok filter.

That's what I'd do too. Didn't this work?

I see that you are jumping into two different file here. I only have the logstash config file to go after.

In my example the "data" file is the test input file. Or what are you referring to?

Hello sorry if I was a little unclear. Your solution did work it was just me who messed it up a little bit. I really need to get better at using regular expression. I guess I can take your regular expression and put it into a custom grok filter and refer to it?

Thank you so much for your help. Although I have one last question. When I use "Terms" when I'm about to visualize all the repositories the repositories is often called "hej-hello-hi" or "monkey-hey-monkey" and what kibana does is that it takes them and thinks that for example "hej" and "hello" is it's own repository which makes up a little problem. Do you know a solution for this?

I guess I can take your regular expression and put it into a custom grok filter and refer to it?

Yes, that should be fine.

Thank you so much for your help. Although I have one last question. When I use "Terms" when I'm about to visualize all the repositories the repositories is often called "hej-hello-hi" or "monkey-hey-monkey" and what kibana does is that it takes them and thinks that for example "hej" and "hello" is it's own repository which makes up a little problem. Do you know a solution for this?

That's a completely different question, but in short you need to make sure the field isn't analyzed (or is analyzed differently), typically done by configuring a custom index template and setting the field in question to not_analyzed.

That's a completely different question, but in short you need to make sure the field isn't analyzed (or is analyzed differently), typically done by configuring a custom index template and setting the field in question to not_analyzed.

Related topic:

Thank you very much. You've been really helpful to a newbie like me.

One more thing. I restarted both logstash, elasticsearch and kibana. Now it puts the "repo" field under "missing fields". does this mean that it's not indexed or analyzed?

EDIT: No worries I fixed it. I just had to go and re-index the fields under the settings tab in Kibana.

How it's just the last thing to "un-analyze" the repo part. Where is the file usually located? I can't seem to find mine.

EDIT: Manage to solve it in another way. I just took the "repo.raw" field.

Hello!

Sorry for bringing up this topic again. Your regular expression was really helpful yesterday. However I found a new pattern that I would really like to capture. I googled a lot yesterday but I really didn't find anything. The expression you gave me does it's job and extracts all the repository names from the path. Although in the repository there are different files. I only want to extract the ones when there are .jar files and .rpm files involved.

Here are some typical lines

This is what is interesting.

/nexus/content/repositories/repository-name/blabla/blablabla/blablabla/bla/blablabla/blablabla/blablabla.jar

This is what's not interesting.

/nexus/content/repositories/repository-name/blabla/blablabla/blablabla/bla/blablabla/blablabla/blablabla.xml

Is there some way to still extract the same information but have the .jar file in thought while doing so?

You can either wrap your existing grok filter in a conditional, e.g.

if [uri] =~ /\.(jar|rpm)$/ {
  grok {
    ...
  }
}

or add .*\.(jar|rpm)$ to the end of your existing grok expression. However, messages not matching that will get the _grokparsefailure tag so you'll probably want to reconfigure the grok filter to not add that tag.

Thank you. Most helpful. Will give it a try! :slight_smile:

Right now my filter looks like this.

grok {

     type => "nexus-log"
     patterns_dir => "./config-dir/patterns"
     match => ["message", "\b\w+\b\s/nexus/content/repositories/(?<repo>[^/]+)"]

So I can basically do like this?

if [message] =~ /\.(jar|rpm)$/ {
  grok {
    ...
  }
}

Yes, that should work fine.

So just to clarify. If I choose to not go with the "if" path because it didn't seem to work for me should the line look something like this?

\b\w+\b\s/nexus/content/repositories/(?<repo>[^/]+).*\.(jar|rpm)$