Split grok-pattern into multiple lines

Hi,

Is it possible to split a grok-pattern into multiple lines instead of have one big?

grok {
  pattern_definitions => {
    "CM" => "[/.\-\w\s]*"
  }
  match => {"syslog_message" => "%{CM:unknown_id} %{IP:this_ip},%{HOSTNAME:hostname},%{CM:resource_type},%{CM:some_name},%{CM:unknown_01},%{IP:src_ip},%{CM:unknown_02},%{IP:dst_ip},%{NUMBER:src_port:int},%{NUMBER:dst_port:int},%{CM:partition},%{CM:protocol},%{NUMBER:domain},%{CM:unknown_03},%{CM:unknown_04},%{CM:unknown_05},%{CM:unknown_06},%{CM:unknown_09},%{CM:unknown_10},%{CM:unknown_11},%{CM:policy_type},%{CM:policy_name},%{CM:rule_name},%{CM:unknown_15},%{CM:dev_action},%{CM:unknown_17},%{CM:unknown_18},%{CM:unknown_19},%{CM:unknown_20},%{CM:unknown_21},%{CM:unknown_22},%{CM:unknown_23},%{CM:unknown_24},%{CM:unknown_25},%{CM:unknown_26},%{CM:unknown_27},%{CM:unknown_28},%{CM:unknown_29},%{CM:unknown_30}"}
}

This line is way to long.

Most likely yes, we need to se the org. message and what should be result.

The parsing works fine, I see no need for a message.

The issue I’m finding is having a line with 700+ chars in git and editors is not optimal.
Is there a way to build the grok-pattern over multiple lines, with an << or += operator perhaps?

Can you share a sample of your message?

From the pattern you are using in the grok filter your message seems to be a csv message, you could use the csv filter to parse it instead of grok.

@leandrojmp has good suggestion to use csv...

But to answer your question ... Yes... but it will not be effecient....

It would be something like

grok {
  pattern_definitions => {
    "CM" => "[/.\-\w\s]*"
  }
  match => {"syslog_message" => "%{CM:unknown_id} %{IP:this_ip},%{HOSTNAME:hostname},...%{GREEDYDATA:msg_part2}
}


grok {
  pattern_definitions => {
    "CM" => "[/.\-\w\s]*"
  }
  match => {"msg_part2" => "<Grok Patterns>%{GREEDYDATA:msg_part3}
}


grok {
  pattern_definitions => {
    "CM" => "[/.\-\w\s]*"
  }
  match => {"msg_part3" => "<Grok Patterns>}
}

Not as efficient

Yes, you can use custom pattern definitions within a custom pattern definition.

input { generator { count => 1 lines => [ 'Foo, Or Bar,Or Baz' ] } }

output { stdout { codec => rubydebug { metadata => false } } }
filter {
    grok {
        pattern_definitions => {
            ONE => "%{WORD}"
            TWO => "(?<Foo>[^,]*),%{GREEDYDATA}"
            OVERALL => "%{ONE},%{TWO}"
        }
        match => { "message" => "^%{OVERALL}" }
    }

will produce

       "Foo" => " Or Bar"
2 Likes

TIL!

Great I will try that.

Yes, the message looks very much as a csv-row in this example, but in our real config there is multiple “match”-lines and the input varies in number of columns.
The first and second column determines what column has what values.

The idea with CSV is good and I did not think of it, will try and see if we can use it.

Thank!

This is one of the possible alternatives that we thought of.
We also was thinking of the performance of it.

Thanks!

Not sure which are better performances, CSV or dissect, you can try both in your case. The dissect filter is ~10x faster than grok. However csv is much more useful if you have pure csv format.

This is not an issue, if you have different types of message you can still combine othe filters or use conditional to correctly parse it.

Not clear what you mean with this, without you sharing sample of messages is pretty complicated to provide any insight.

The main thing is that while grok can parse almost anything, sometimes you can use other parse filters or combination of other parse filters to make things easier.

Personally I only use grok as the last option, when a message cannot be parsed using other filters or combination of filters.

1 Like