Logstash with grok patterns applied on logs coming out of docker containers

Hi everyone,

I'm using docker labels to filter for certain messages and to apply the appropriate grok patterns.
The patterns work when I test them with the grok debugger but I'm getting _grokparsefailure when I use them in this configuration:

Content of logstash.conf

filter {
  if [docker][container][labels] == "tomcat-container" {
    grok {
       patterns_dir => "/etc/logstash/patterns.d"
       match => { "message" => "%{MYLOGPATTERN}" }
    }
    mutate {
      add_tag => [ "docker_filter_label" ]
      remove_tag => ["beats_input_raw_event"]
    }
  }
}

Content of /etc/logstash/patterns.d/custom-patterns

MYLOGPATTERN %{DATESTAMP_EVENTLOG:timestamp}|%{TIMESTAMP_ISO8601:timestamp} *| %{DATA:thread} +| %{LOGLEVEL:level} +| %{GREEDYDATA:message}

The log entry coming out of the container (besides the docker metadata) is this:

2019-05-17 16:04:35 | 3-thread-1 | DEBUG | z.z.z.Jobrun | Doing job run. Next run scheduled.

It's working when I use just the filter and tagging options:

filter {
if [docker][container][labels] == "tomcat-container" {
mutate {
add_tag => [ "docker_filter_label" ]
remove_tag => ["beats_input_raw_event"]
}
}
}

Logstash is able to compile the patterns:

[2019-05-18T15:19:21,656][DEBUG][logstash.filters.grok ] Grok compiled OK {:pattern=>"%{MYLOGPATTERN}", :expanded_pattern=>"(?:(?<DATESTAMP_EVENTLOG:timestamp>(?:(?>\d\d){1,2})(?:(?:0[1-9]|1[0-2]))(?:(?:(?:0[1-9])|(?:[12][0-9])|(?:3[01])|[1-9]))(?:(?:2[0123]|[01]?[0-9]))(?:(?:[0-5][0-9]))(?:(?:(?:[0-5]?[0-9]|60)(?:[:.,][0-9]+)?)))|(?<TIMESTAMP_ISO8601:timestamp>(?:(?>\d\d){1,2})-(?:(?:0?[1-9]|1[0-2]))-(?:(?:(?:0[1-9])|(?:[12][0-9])|(?:3[01])|[1-9]))T :?(?:(?:[0-5][0-9]))(?::?(?:(?:(?:[0-5]?[0-9]|60)(?:[:.,][0-9]+)?)))?(?:(?:Z|+-(?::?(?:(?:[0-5][0-9])))))?) )"}

Any ideas?

Your log entry does not have two timestamps. You should blockquote your log entries, patterns, and second configuration the same way you blockquoted the first configuration (i.e. indent by 4 spaces). Otherwise we have to guess which characters in your pattern have been consumed as markdown by spotting where your text is in italics etc.

The following works, assuming you need to consume timestamps in two different formats. Not sure whether you want to use the overwrite option or not.

    pattern_definitions => { "MYLOGPATTERN" => "^(%{DATESTAMP_EVENTLOG:timestamp}|%{TIMESTAMP_ISO8601:timestamp}) *\| %{DATA:thread} +\| %{LOGLEVEL:level} +\| %{GREEDYDATA:message}" }
    match => { "message" => "%{MYLOGPATTERN}" }
    #overwrite => [ "message" ]

Note that match failures are much cheaper when you anchor your regexps to start of line using ^.

Hi @Badger

Thank you for your reply and the explanation!
This pattern is working fine when applied within the Grok Debugger but is still causing _grokparsefailure when applied within logstash.conf:

filter {
  if [docker][container][labels] == "tomcat-container" {
    grok {
       pattern_definitions => { "MYLOGPATTERN" => "^(%{DATESTAMP_EVENTLOG:timestamp}|%{TIMESTAMP_ISO8601:timestamp}) *\| %{DATA:thread} +\| %{LOGLEVEL:level} +\| %{GREEDYDATA:message}" }
       match => { "message" => "%{MYLOGPATTERN}" }
    }
  }
}

Addition to my previous post:

The difference between testing with the Grok Debugger and applying it in logstash.conf:

The log entry as tested in the Grok Debugger (this way the pattern work)

2019-05-17 16:04:35 | 3-thread-1 | DEBUG | z.z.z.Jobrun | Doing job run. Next run scheduled.

The raw log entry coming from the machine hosting the docker containers looks like this:

{"log":"2019-05-17 16:04:35 | 3-thread-1 | DEBUG | z.z.z.Jobrun | Doing job run. Next run scheduled.\n","stream":"stdout","time":"2019-05-17T06:04:15.521574579Z"}

I'm using the docker document_type with the add_docker_metadata config and this works like a charm, I'm getting the docker metadata in Kibana:

2019-05-28_10-19-34

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.