Basic logstash mutations

So, for whatever reason, I'm having a hard time grasping working with certain aspects of the logstash filter portion of the pipeline. I see plenty of examples for how to do specific things, but I can't seem to come up with my use case.

So first off, this is my basic object as it arrives now in elasticsearch:

      {
        "_index" : "logstash",
        "_type" : "_doc",
        "_id" : "9ZG12WwBMTGr8z96UI4O",
        "_score" : 1.0,
        "_source" : {
          "userId" : 12632,
          "tab" : "3900b968-3775-415f-9e12-cb5997aa0643",
          "username" : "john.smith",
          "@version" : "1",
          "@timestamp" : "2019-08-28T19:30:34.023Z",
          "headers" : {
            "connection" : "keep-alive",
            "request_path" : "/",
            "accept_encoding" : "gzip, deflate, br",
            "accept_language" : "en-US,en;q=0.9,mt;q=0.8",
            "http_user_agent" : "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36",
            "http_accept" : "application/json, text/plain, */*",
            "sec_fetch_mode" : "cors",
            "content_type" : "text/plain",
            "request_method" : "POST",
            "http_host" : "localhost:3000",
            "origin" : "https://staging1.acme.services",
            "http_version" : "HTTP/1.1",
            "content_length" : "308",
            "sec_fetch_site" : "cross-site"
          },
          "sourcePath" : "/answers/4/place/4",
          "destinationPath" : "/tax-charts/4",
          "token" : "eyJhbGciOiJIasdfasdfadfasdfasdfasdfySUQiOjEyNjMyLCJleHAiOjE1Njc1NDM5MzUsImlzcyI6InR0ciJ9.KwDWdthXtXA26zZ1h_RjEXPiWY4WMOAM08PfKRm6OTI"
        }
      }

which is generated from a json post that looks like such:

{"sourcePath":"/answers/4/place/4","destinationPath":"/tax-charts/4","username":"john.smith","userId":12632,"token":"eyJhbGciOiJIasdfasdfadfasdfasdfasdfySUQiOjEyNjMyLCJleHAiOjE1Njc1NDM5MzUsImlzcyI6InR0ciJ9.KwDWdthXtXA26zZ1h_RjEXPiWY4WMOAM08PfKRm6OTI","tab":"3900b968-3775-415f-9e12-cb5997aa0643"}

What I would like to do is add fields if certain regex conditions are true, and especially to match groups from those regex expressions to be used with translate.

So for instance:

if [destinationPath] =~ "\/matrix|answers|tax-charts\/(\d+)\/" { #(this should have a capture group of a library id...)
        mutate { add_field => { "libraryId" => "$1" } } # is it possible to refer back to that last capture group? How else do I specify just that id part of the field?
        translate {
        field => "libraryId"
        destination => "library"
        dictionary => {
            "1" => "Production"
            "2" => "Retail"
            "3" => "Automotive Leasing"
            "4" => "Automotive Sales"
            "5" => "Automotive Part Sales"
            ...
        }
    }

For whatever reason, I'm just having a terrible time figuring out the Grok thing. I'm certain the answer will be in there. In fact, if you can point toward any resource that will ELI5 i'd be so grateful.

The best I could come up with is:

grok { match => { destinationPath => "/answers|matrix|tax-charts/(?<libraryId>\d+)/place/(?<placeId>\d+)" } }

But it doesn't work. I'm not sure what I'm not getting about this, whether it's the structure of a grok statement or oniguruma, but help is appreciated.

That will match against any one of

/answers
matrix
tax-charts/(?<libraryId>\d+)/place/(?<placeId>\d+)

The first two do not capture any fields. You need parentheses around the three options for the first part of the URL. Also, the example tax-charts URL you give does not have a /place/ in it. so the last part has to be optional, which we can do using ( )?

grok { match => { "destinationPath" => "/(answers|matrix|tax-charts)/(?<libraryId>\d+)(/place/(?<placeId>\d+))?" } }

Ooh thanks @Badger , that ()? optional syntax is really useful!

I did end up with a couple Grok lines, and then if one didn't work the next one might, but it was adding a tag with the ["_grokparsefailure"] for the one that didn't match. This should save me a a few variations as I note below.

I guess part of my problem was the mentality that I wanted to write several filter statements that each did an action to a field -- but the Grok way is to parse the entire field at once. I'm not sure that works well since I might have multiple data formats in a given field (ie, url structure), but I guess I can have IF/ELSE trees with =~ regex patterns to match the right Grok statement to the right field value format.

Unfortunately there are about 20~ different variations for url routes, but once I get them all mapped out I'm sure it won't be too bad.

I disagree :slight_smile: That is the way a lot of grok examples are written, because grok is well matched to standardized formats. And most folk start with a single line format, such as a web server log, so that is how they first learn to use grok, so that is how they first write an example of using it. But you can use grok to pull out more than one small fields from diffently formatted lines, as mentioned here. And if you have a fixed prefix it may make a lot more sense to use dissect to parse that prefix, as described here.

grok can parse almost anything, and as a result folks tend to use it to parse almost everything, but parsing the entire line using a single grok expression is often not the best approach. It will work, but there may be other approaches that in the long run are easier to maintain, and/or less CPU intensive, and/or less fragile.

Sometimes you get all of the ands :wink:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.