Problem "grok"ing when there is a conditional part on it

If I have this text:

blah1=faa faa2 blah2=fee blah3=fii blah4=foo

how can I have the values of each variable (the one that are before the equal sign) with regex? I’ve tried with this grok:

blah1=(?<blah1>[^=]+) blah2=(?<blah2>[^=]+) blah3=(?<blah3>[^=]+) blah4=(?<blah4>[^\n]+)

but it doesn’t work fine when the text changes to something like this:

blah1=faa faa2 blah2=fee blah3=fii

I’ve tried with something like this:

blah1=(?<blah1>[^=]+) blah2=(?<blah2>[^=]+) blah3=(?<blah3>[^=]+)(| blah4=(?<blah4>[^\n]+))

And works.. BUT… that one doesn’t work wit the previous one (it only matches until the “blah3” part):

blah1=faa faa2 blah2=fee blah3=fii blah4=foo

So.. How should I create the grok in order to work in both cases?

If the message you want to parse has this format, you do not need to use grok, the message is key-value message and you can use the kv filter to parse it easily.

You will just need to use a mutate filter to change your message because you have unquoted values with spaces.

The following filter combination will parse your message:

ffilter {
    mutate {
        gsub => ["message", "(\S+=)", ",\1"]
    }
    mutate {
        gsub => ["message", " ,", ","]
    }
    kv {
        source => "message"
        field_split => ","
    } 
}

The first mutate with gsub changes your message from this:

blah1=faa faa2 blah2=fee blah3=fii blah4=foo

To this:

,blah1=faa faa2 ,blah2=fee ,blah3=fii ,blah4=foo

The second mutate with gsub will remove the extra space before the , so your ending message will be:

,blah1=faa faa2,blah2=fee,blah3=fii,blah4=foo

Now you can use the kv filter to parse your message setting the field_split to use the , and you will have your fields like this sample output:

{
         "blah3" => "fii",
         "blah4" => "foo",
      "@version" => "1",
       "message" => ",blah1=faa faa2,blah2=fee,blah3=fii,blah4=foo",
    "@timestamp" => 2022-12-23T14:56:10.353Z,
          "host" => "elk-lab",
         "blah2" => "fee",
         "blah1" => "faa faa2"
}

1 Like

Wow.. clever one the part of adding the commas. I'll try it. Thanks!
But still.. I'm confised on why if I have a conditional (with "(|..." or even with "(?:....)?") it assumes always the conditional part (the blah4 in my example) doesn't exists.

I totally agree with Leandro that grok is the wrong tool, but I found this an interesting question so I decided to answer it...

The problem with your second regexp

blah1=(?<blah1>[^=]+) blah2=(?<blah2>[^=]+) blah3=(?<blah3>[^=]+)(| blah4=(?<blah4>[^\n]+))

is that the blah3 pattern consumes everything that is not an equals sign (i.e. fii blah4 and then the alternation in the blah4 pattern allows an empty match, so the overall pattern consumes

blah1=faa faa2 blah2=fee blah3=fii blah4=

from

blah1=faa faa2 blah2=fee blah3=fii blah4=foo

and leaves the foo unmatched. We can fix that by anchoring the end of the pattern using $ to force the whole line to be consumed.

    match => { "message" => "blah1=(?<blah1>[^=]+) blah2=(?<blah2>[^=]+) blah3=(?<blah3>[^=]+)(| blah4=(?<blah4>[^\n]+))$" }

results in

{
     "blah1" => "faa faa2",
     "blah4" => "foo",
     "blah3" => "fii",
     "blah2" => "fee"
}
{
     "blah1" => "faa faa2",
     "blah3" => "fii",
     "blah2" => "fee"
}
1 Like

Now I see @leandrojmp was totally right about not using grok in this case, but as you @Badger said, this was interesting to clarify.
I just test @Badger 's solution and worked perfectly.
Now, in production I'm using @leandrojmp 's approach and works like a charm.
Thank you both. Everyday you learn something.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.