Take out bits of a URIPATH in Logstash

simonrisberg · June 30, 2015, 6:20am

Hi!

My name is Simon and I'm pretty new to ELK. I have been working with it for a few weeks now and I'm starting to understand how everything works (I think?). Back to the point, I am right now using Logstash to read from a nexus log file which shows which repositories are being updated. The problem is that the log spits out a message with a whole path to the specific repository. I want to make a custom grok filter that only extracts the specific repository.

There is a pattern that follows which is that the name "repositories" always comes before the specific repository name like "nexus/blablabla/repositories/monkey"

I want to extract the "monkey" part so I can get useful data out of it when I'm visualizing it in Kibana 4. So far I've managed to create a custom grok filter that gets it down to "repositories/monkey" but that's it.

Any help would be greatly appreciated.

Best Regards

Simon

magnusbaeck · June 30, 2015, 6:30am

So, given nexus/blablabla/repositories/monkey you want to extract "money"? This should work:

grok {
  match => ["uri", "/repositories/(?<repo>[^/]+)"]
}

simonrisberg · June 30, 2015, 6:31am

Thank you. I will try this! Can I just write the regular expression directly into the logstash config file without using a grok pattern? I didn't know that.

simonrisberg · June 30, 2015, 6:50am

Hello again. I tried putting in your pattern. Unfortuneately I got the same result as I did with the old one I created which was "repositories/my-repository-name". Now my field also shows up under "missing fields". What does this mean?

Best regards.

magnusbaeck · June 30, 2015, 7:32am

Can I just write the regular expression directly into the logstash config file without using a grok pattern? I didn't know that.

Grok is just a convenience layer on top of plain old regular expressions.

I tried putting in your pattern. Unfortuneately I got the same result as I did with the old one I created which was "repositories/my-repository-name".

What, exactly, did you try? The following minimal example works as expected:

$ cat test.config 
input { stdin { codec => plain } }
output { stdout { codec => rubydebug } }
filter {
  grok {
    match => ["message", "/repositories/(?<repo>[^/]+)"]
  }
}
$ cat data
/foo/bar/repositories/baz
$ /opt/logstash/bin/logstash -f test.config < data
{
       "message" => "/foo/bar/repositories/baz",
      "@version" => "1",
    "@timestamp" => "2015-06-30T07:30:12.942Z",
          "host" => "redacted",
          "repo" => "baz"
}

Now my field also shows up under "missing fields". What does this mean?

Where are you getting this?

simonrisberg · June 30, 2015, 7:33am

This is what my logstash config looks like right now. I'm using a custom grok pattern called NEXUSREP that I've created. I have it in a directory called "patterns" which is located in the same folder as the logstash config. The one you see below is the filter part of my logstash config. I get the "missing fields" part in Kibana 4 because I can't locate it as a field on the left side.

filter {

grok {

 type => "nexus-log"
 patterns_dir => "./config-dir/patterns"
 match => ["message", "%{NEXUSREP:repository}"]

}

simonrisberg · June 30, 2015, 7:44am

I tried to put the regular expression directly into my "match" path after the message part instead of referring to my custom grok filter. I see that you are jumping into two different file here. I only have the logstash config file to go after.

magnusbaeck · June 30, 2015, 7:48am

What's your definition of NEXUSREP?

I tried to put the regular expression directly into my "match" path after the message part instead of referring to my custom grok filter.

That's what I'd do too. Didn't this work?

I see that you are jumping into two different file here. I only have the logstash config file to go after.

In my example the "data" file is the test input file. Or what are you referring to?

simonrisberg · June 30, 2015, 7:53am

Hello sorry if I was a little unclear. Your solution did work it was just me who messed it up a little bit. I really need to get better at using regular expression. I guess I can take your regular expression and put it into a custom grok filter and refer to it?

Thank you so much for your help. Although I have one last question. When I use "Terms" when I'm about to visualize all the repositories the repositories is often called "hej-hello-hi" or "monkey-hey-monkey" and what kibana does is that it takes them and thinks that for example "hej" and "hello" is it's own repository which makes up a little problem. Do you know a solution for this?

magnusbaeck · June 30, 2015, 8:25am

I guess I can take your regular expression and put it into a custom grok filter and refer to it?

Yes, that should be fine.

Thank you so much for your help. Although I have one last question. When I use "Terms" when I'm about to visualize all the repositories the repositories is often called "hej-hello-hi" or "monkey-hey-monkey" and what kibana does is that it takes them and thinks that for example "hej" and "hello" is it's own repository which makes up a little problem. Do you know a solution for this?

That's a completely different question, but in short you need to make sure the field isn't analyzed (or is analyzed differently), typically done by configuring a custom index template and setting the field in question to not_analyzed.

magnusbaeck · June 30, 2015, 8:47am

That's a completely different question, but in short you need to make sure the field isn't analyzed (or is analyzed differently), typically done by configuring a custom index template and setting the field in question to not_analyzed.

Related topic:

simonrisberg · June 30, 2015, 9:09am

Thank you very much. You've been really helpful to a newbie like me.

simonrisberg · June 30, 2015, 9:15am

One more thing. I restarted both logstash, elasticsearch and kibana. Now it puts the "repo" field under "missing fields". does this mean that it's not indexed or analyzed?

EDIT: No worries I fixed it. I just had to go and re-index the fields under the settings tab in Kibana.

simonrisberg · June 30, 2015, 9:19am

How it's just the last thing to "un-analyze" the repo part. Where is the file usually located? I can't seem to find mine.

EDIT: Manage to solve it in another way. I just took the "repo.raw" field.

simonrisberg · July 1, 2015, 7:45am

Hello!

Sorry for bringing up this topic again. Your regular expression was really helpful yesterday. However I found a new pattern that I would really like to capture. I googled a lot yesterday but I really didn't find anything. The expression you gave me does it's job and extracts all the repository names from the path. Although in the repository there are different files. I only want to extract the ones when there are .jar files and .rpm files involved.

Here are some typical lines

This is what is interesting.

/nexus/content/repositories/repository-name/blabla/blablabla/blablabla/bla/blablabla/blablabla/blablabla.jar

This is what's not interesting.

/nexus/content/repositories/repository-name/blabla/blablabla/blablabla/bla/blablabla/blablabla/blablabla.xml

Is there some way to still extract the same information but have the .jar file in thought while doing so?

magnusbaeck · July 1, 2015, 9:17am

You can either wrap your existing grok filter in a conditional, e.g.

if [uri] =~ /\.(jar|rpm)$/ {
  grok {
    ...
  }
}

or add .*\.(jar|rpm)$ to the end of your existing grok expression. However, messages not matching that will get the _grokparsefailure tag so you'll probably want to reconfigure the grok filter to not add that tag.

simonrisberg · July 1, 2015, 9:25am

Thank you. Most helpful. Will give it a try!

simonrisberg · July 1, 2015, 9:39am

Right now my filter looks like this.

grok {

     type => "nexus-log"
     patterns_dir => "./config-dir/patterns"
     match => ["message", "\b\w+\b\s/nexus/content/repositories/(?<repo>[^/]+)"]

So I can basically do like this?

if [message] =~ /\.(jar|rpm)$/ {
  grok {
    ...
  }
}

magnusbaeck · July 1, 2015, 9:51am

Yes, that should work fine.

simonrisberg · July 1, 2015, 12:04pm

So just to clarify. If I choose to not go with the "if" path because it didn't seem to work for me should the line look something like this?

\b\w+\b\s/nexus/content/repositories/(?<repo>[^/]+).*\.(jar|rpm)$

Topic		Replies	Views
Regular expression problem Logstash	12	2466	July 6, 2017
Help with a grok pattern Logstash	6	2990	July 6, 2017
Split grok pattern Logstash	3	2870	July 6, 2017
Extract a string from logstash path Kibana	6	4769	January 18, 2018
Problem grok pattern Logstash	6	1274	May 12, 2017

Take out bits of a URIPATH in Logstash

Related topics