Parsing logs with Regexs

I am parsing logs that have follow along the lines of:

timestamp... NEW EVENT ...timestamp

I am having an issue in storing the NEW EVENT part of the log. When I try

%{DATESTAMP:datestamp}... (%{WORD:action} )* ...%{DATESTAMP:occurence}

EVENT is stored in action. I want to store NEW EVENT in action. I can't seem to figure out how to store this. There does not seem to be any regex that allows for this. Is there a syntax that I am missing that can do this?

Thanks!

can u give full raw log ?
U can debug in https://grokdebug.herokuapp.com

WORD matches one word and "NEW EVENT" clearly is two words. If you give a more concrete example of what you want to match it'll be easier to help out.

05/02/17 22:11:01 host [1234]: NEW EVENT 05/02/17 22:11:01 123456789 ACC/PRI INT PREC TOT ACC/INTPRI DIFF HIGH LIMIT 1 WARNING|user ACKNOWLEDGED 0

%{DATESTAMP:datestamp} %{HOSTNAME:hostName} (%{PROG:processName})?\[%{NUMBER:processID}\]: +(%{HOSTNAME:node})? +(%{NOTSPACE:user})? +(%{NOTSPACE:action} )+ +%{DATESTAMP:occurence} +%{DATA:origin} +%{DATA:description} +%{DATA:event} +%{DATA:value} +%{NUMBER:severity}

The action part is what I am looking to make into a single word. Currently it stores EVENT and not NEW EVENT.

Update: I've got it working with a little workaround. Here is the code.

%{DATESTAMP:datestamp} %{HOSTNAME:hostName} (%{PROG:processName})?\[%{NUMBER:processID}\]: +(%{HOSTNAME:node})? +(%{NOTSPACE:user})? %{NOTSPACE:action1} +(%{NOTSPACE:action2} )+ +%{DATESTAMP:occurence} +%{DATA:origin} +%{DATA:description} +%{DATA:event} +%{DATA:value} +%{NUMBER:severity}

if [action1] and [action2]{
	mutate{
		add_field => {"action" => "%{action1} %{action2}"}
	}
}

Ideally I wouldn't want to have that second chunk of code or store action1 and action2. So if anyone has a solution where I can change my grok that'd be appreciated

You can e.g. use (?<action>\w+ \w+) to match and capture two consecutive words.

Thanks @magnusbaeck that worked. Can you help me understand it so that I can write groks better in the future. I understand that the \w is a word character. But the plus in front of the action is confusing for me.

Plus sign means "one or more occurrences of the immediately preceding token".

Oops. I meant the ? in front of the action. I think I understand the \w+ \w+. From what I understand of regexes the ? means match one or more of previous token. But when it is inside of the () I am unsure if that means the action field is optional or not?

? means "zero or one occurrence of the preceding token". Parentheses can be used to group tokens, i.e. it changes the maning of "preceding token". With abc? it's just the "c" that's optional but with (abc)? it's the whole phrase "abc" that's optional.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.