Is there a reason not to use NOTSPACE pattern in Grok

I have written some grok patterns to parse Barracuda Spam Firewall Mail log. I initially used patterns like NUMBER, IPORHOST, WORD.It turned out that some number fields contained a - or some weird characters, so I replaced those patterns with NOTSPACE which has worked fine so far.

I haven't used grok a lot, so I wonder if there are any reasons not to use NOTSPACE like:

  • regex performance concerns among NOTSPACE vs NUMBER, WORD, IPORHOST, etc.?
  • don't want to insert a text value into a number field in ES (let the grok fail instead)?
  • anything else?

Thanks

Yeah, I believe you're on the right track. I think too many people view a grok expression as some kind of input validation where it's crucial to match each token exactly, but sometimes you just risk being too strict (you can also end up with an expression that's very complex, which can hurt both readability and performance).

regex performance concerns among NOTSPACE vs NUMBER, WORD, IPORHOST, etc.?

NOTSPACE should be just as fast as anything else.

don't want to insert a text value into a number field in ES (let the grok fail instead)?

This is a valid reason to use INT or NUMBER. If you attempt to stuff a string value into an int field ES will reject the request and you'll lose the event but that won't happen with a failed grok match and then you'll at least be able to see the non-matching events and maybe do something about it.

3 Likes

Thanks for the clarification.

For certain fields like port number, message ID, even though they contain digits, there's no reason to map them as int or long, so I just need grok to grab the values and send them to ES as those fields are mapped as string.