Grok - Parsing a String with Spaces

For a while I've been trying to parse a syslog input in Logstash. I've encountered something like this:

2019-08-05 08:55:15 - jcramer(string with user information) - Successfully updated profile

(this is an example meant to resemble part of the actual log. It will not appear in proper syslog format)

My end goal is for the fields to appear like so:

"syslog_timestamp" => "2019-08-05 08:55:15"
"syslog_username" => "jcramer"
"syslog_userstring" => "string with user information"
"syslog_message" => "Successfully updated profile"

I thought of parsing it like this:

grok {
    match { "message" => "%{TIMESTAMP_ISO8601:syslog_timestamp} - 
    %{USERNAME:syslog_username}(%{NUMBER:syslog_userstring}) - 
    %{GREEDYDATA:syslog_message}" }
}

The NUMBER pattern will only read until the first space - it won't read the whole string. The GEEDYDATA Grok pattern seems to successfully parse the entire last entry into a single field, so I'm wondering if this is the pattern I should use.

How can I use Grok to parse the entire string between the parentheses (i.e. syslog_userstring). This string varies with every log and follows no particular format.

I would match a string of characters that are not close parenthesis...

grok { match => { "message" => "^%{TIMESTAMP_ISO8601:syslog_timestamp} - %{USERNAME:syslog_username}\((?<syslog_userstring>[^)]+)\)\s+- %{GREEDYDATA:syslog_message}" } }

You should get into the habit of anchoring patterns when you can.

Thanks for the help badger. Unfortunately this suggestion resulted in a _grokparsefailure.

I should mention, the parentheses can sometimes
surround nothing, i.e. jcramer(),
or a string with special characters like -, /, and &.

Maybe it may not matter what the string contains. Perhaps I may not have integrated your code correctly, but I've tested a few different scenarios. In your suggestion, is \s+ necessary if the log were structured like this?

2019-08-05 08:55:15 - jcramer(string with user information)[Successfully updated profile]

Change that to

[^)]*

if the parentheses can be empty. Special characters are not a problem, as long as there is never a ) in the string.

If the whitespace and dash are not there then remove '\s+- '

Thank you! Changing the syntax worked.

Also, did you suggest anchors so I can avoid using the GREEDYDATA pattern? I'd love to do this efficiently, so I'm interested to know.

That being said, I'm also having trouble finding sources to understand the syntax of creating Grok patterns. Let me know if you have one in mind. I appreciate the help.

No, I suggested anchoring because it makes the patterns fail more quickly if the line does not match the pattern. Having the last thing in the pattern be GREEDYDATA (as you have) is cheap. Having GREEDYDATA anywhere else in the pattern is not cheap, because it can cause backtracking.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.