After Updating Logstash to 7.7 from 7.6 regex isn't working the same

I recently updated to Logstash 7.7 from 7.6, after this update the following is occurring:

A typical log line looks like this:
2019-10-02T04:00:09.254Z - ^[[32minfo^[[39m: Activity:content-finish 35692

When processing from Filebeat, the filter fails with a "_grokparsefailure" and the resulting message line looks like this in stdout:
"message" => "2019-10-02T04:00:09.254Z - \e[32minfo\e[39m: Activity:content-finish 35692"

The difference is the LOGLEVEL ANSI Color codes, which seem to change from:
^[[32minfo^[[39m
to
\e[32minfo\e[39m

This was working correctly, before the update.

Here is my config:

input { 
  beats {
    port => "5044"
  }
}

filter {
  if [message] =~ /^\d+/ {
    grok {
      match => { "message" => "%{TIMESTAMP_ISO8601:timestamp}%{SPACE}-%{SPACE}(\^\[\[\d+m)%{LOGLEVEL:level}(\^\[\[\d+m):%{SPACE}%{GREEDYDATA:message}" }
      overwrite => { "message" => "%{message}" }
      remove_field => [ "host", "agent" ]
    }

    grok {
      match => { "[log][file][path]" => "/%{WORD}/%{WORD}/%{WORD}/%{GREEDYDATA:id}/%{NUMBER:uploadepoch}/%{GREEDYDATA:logname}.log*" }
    }

    date {
      match => [ "timestamp", "ISO8601" ]
      remove_field => [ "timestamp" ]
    }

    date {
      match => [ "uploadepoch", "UNIX_MS" ]
      target => "uploadDate"
      remove_field => [ "uploadepoch" ]
    }

    ruby {
      code => 'event.set( "logUploadYear", event.get("uploadDate").time.strftime("%Y"))'
      remove_field => [ "uploadDate" ]
    }

    if "beats_input_codec_plain_applied" in [tags] {
      mutate {
        remove_tag => [ "beats_input_codec_plain_applied" ]
      }
    }
  } else {
    drop{ }
  }
}

output {
  if "_grokparsefailure" in [tags] {
    stdout { codec => rubydebug }
  }
}

Any Ideas?

I should note, I have tried changing the regex to represent what is coming through in stdout:
(\\e\[+\d+m)%{LOGLEVEL:level}(\\e\[+\d+m)
But this doesn't resolve the issue...

An ANSI control sequence introductor consists of an escape character followed by a square bracket. ESC is control+[, which would routinely be printed as ^[. I wonder if something has changed torepresenting escape as \e instead of ^[

Perhaps it has. But wouldn’t my second regex have caught it with this (\\e\[+\d+m)?

I think @Badger has pointed out the root cause.
For further reference , I have tested you configuration.

With ingesting,, 2019-10-02T04:00:09.254Z - \e[32minfo\e[39m: Activity:content-finish 35692
grok_parse_error happens on version 7.6.2 .

OUTPUT

logstash_1  | /usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/awesome_print-1.7.0/lib/awesome_print/formatters/base_formatter.rb:31: warning: constant ::Fixnum is deprecated
logstash_1  | {
logstash_1  |       "@version" => "1",
logstash_1  |            "log" => {
logstash_1  |         "offset" => 462,
logstash_1  |           "file" => {
logstash_1  |             "path" => "/mnt/logs/terminallog.log"
logstash_1  |         }
logstash_1  |     },
logstash_1  |            "ecs" => {
logstash_1  |         "version" => "1.4.0"
logstash_1  |     },
logstash_1  |        "message" => "2019-10-02T04:00:09.254Z - \\e[32minfo\\e[39m: Activity:content-finish 35692",
logstash_1  |           "host" => {
logstash_1  |         "name" => "filebeat"
logstash_1  |     },
logstash_1  |          "input" => {
logstash_1  |         "type" => "log"
logstash_1  |     },
logstash_1  |          "agent" => {
logstash_1  |             "hostname" => "filebeat",
logstash_1  |                   "id" => "14419fef-40d7-489f-b513-ff436a6eadd5",
logstash_1  |              "version" => "7.6.2",
logstash_1  |                 "type" => "filebeat",
logstash_1  |         "ephemeral_id" => "4ca1334e-c220-4b36-ad85-7167ec0ab68e"
logstash_1  |     },
logstash_1  |           "tags" => [
logstash_1  |         [0] "beats_input_codec_plain_applied",
logstash_1  |         [1] "_grokparsefailure"
logstash_1  |     ],
logstash_1  |     "@timestamp" => 2020-05-24T02:14:19.760Z
logstash_1  | }

Fixing the match in the grok filter will fix the error.
I commented out other filters for narrowing down the issue.

filter {
  if [message] =~ /^\d+/ {
    grok {

# From
#      match => { "message" => "%{TIMESTAMP_ISO8601:timestamp}%{SPACE}-%{SPACE}(\^\[\[\d+m)%{LOGLEVEL:level}(\^\[\[\d+m):%{SPACE}%{GREEDYDATA:message}" }

# To
      match => { "message" => "%{TIMESTAMP_ISO8601:timestamp}%{SPACE}-%{SPACE}(\\e\[\d+m)%{LOGLEVEL:level}(\\e\[\d+m):%{SPACE}%{GREEDYDATA:message}" }

      overwrite => { "message" => "%{message}" }
      remove_field => [ "host", "agent" ]
    }

#    grok {
#      match => { "[log][file][path]" => "/%{WORD}/%{WORD}/%{WORD}/%{GREEDYDATA:id}/%{NUMBER:uploadepoch}/%{GREEDYDATA:logname}.log*" }
#    }
#
#    date {
#      match => [ "timestamp", "ISO8601" ]
#      remove_field => [ "timestamp" ]
#    }
#
#    date {
#      match => [ "uploadepoch", "UNIX_MS" ]
#      target => "uploadDate"
#      remove_field => [ "uploadepoch" ]
#    }
#
#    ruby {
#      code => 'event.set( "logUploadYear", event.get("uploadDate").time.strftime("%Y"))'
#      remove_field => [ "uploadDate" ]
#    }
#
#    if "beats_input_codec_plain_applied" in [tags] {
#      mutate {
#        remove_tag => [ "beats_input_codec_plain_applied" ]
#      }
#    }
  } else {
    drop{ }
  }
}

OUTPUT

logstash_1  | /usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/awesome_print-1.7.0/lib/awesome_print/formatters/base_formatter.rb:31: warning: constant ::Fixnum is deprecated
logstash_1  | {
logstash_1  |      "timestamp" => "2019-10-02T04:00:09.254Z",
logstash_1  |        "message" => "Activity:content-finish 35692",
logstash_1  |            "log" => {
logstash_1  |           "file" => {
logstash_1  |             "path" => "/mnt/logs/terminallog.log"
logstash_1  |         },
logstash_1  |         "offset" => 537
logstash_1  |     },
logstash_1  |       "@version" => "1",
logstash_1  |          "input" => {
logstash_1  |         "type" => "log"
logstash_1  |     },
logstash_1  |            "ecs" => {
logstash_1  |         "version" => "1.4.0"
logstash_1  |     },
logstash_1  |          "level" => "info",
logstash_1  |     "@timestamp" => 2020-05-24T02:21:49.776Z,
logstash_1  |           "tags" => [
logstash_1  |         [0] "beats_input_codec_plain_applied"
logstash_1  |     ]
logstash_1  | }

Hi @YuWatanabe,

Thanks for the help on this. One quick point, the INPUT is actually this line:
2019-10-02T04:00:09.254Z - ^[[32minfo^[[39m: Activity:content-finish 35692

Then this is the actual STDOUT output:

{
    "agent" => {
        "version" => "7.7.0",
        "type" => "filebeat",
        "ephemeral_id" => "ae8f89bb-a8b1-472d-8e88-f44ee836efa6",
        "hostname" => "service-logging",
        "id" => "50f56fac-9c09-4580-b230-2e29ea38b75c"
    },
    "log" => {
        "file" => {
            "path" => "/var/devicelogs/processed/100119/1569342392059/logfile.log"
        },
        "offset" => 2584145
    },
    "host" => {
        "name" => "service-logging"
    },
    "logname" => "logfile",
    "message" => "2019-10-02T04:00:09.254Z - \e[32minfo\e[39m: Activity:content-finish 35692",
    "ecs" => {
        "version" => "1.5.0"
    },
    "logUploadYear" => "2019",
    "id" => "100119",
    "@version" => "1",
    "tags" => [
        [0] "_grokparsefailure"
    ],
    "input" => {
        "type" => "log"
    },
    "@timestamp" => 2020-05-23T16:46:37.986Z
}

The issue is that the INPUT is different than what is provided in stdout. And as I noted, I've tried updating the regex to (\\e\[+\d+m) but it does not make a difference, I get the same output.

Hello.

Lets clarify the situation step by step.
Can you comment out the whole filter stanza and post the ruby debug output ?
You might want to change output stanza as below to get the output .

 output {
  if "_grokparsefailure" in [tags] {
    stdout { codec => rubydebug }
  } else {
    stdout { codec => rubydebug }
  }
}

This can clarify the raw input what logstash is ingesting from filebeat.

Sure thing. Here is the output, without any filter:

{
    "tags" => [
        [0] "beats_input_codec_plain_applied"
    ],
    "@version" => "1",
    "host" => {
        "name" => "service-logging"
    },
    "@timestamp" => 2020-05-24T03:10:09.623Z,
    "ecs" => {
        "version" => "1.5.0"
    },
    "agent" => {
        "hostname" => "service-logging",
        "ephemeral_id" => "ae8f89bb-a8b1-472d-8e88-f44ee836efa6",
        "type" => "filebeat",
        "version" => "7.7.0",
        "id" => "50f56fac-9c09-4580-b230-2e29ea38b75c"
    },
    "log" => {
        "file" => {
            "path" => "/var/devicelogs/processed/100119/1569342392059/logfile.log"
        },
        "offset" => 2600571
    },
    "input" => {
        "type" => "log"
    },
    "message" => "2019-10-02T15:49:45.307Z - \e[32minfo\e[39m: Activity:content-finish 35692 5110905 config-180611.xml xml/media/config-180611-CL1.xml"
}

@chrisdm
This shows that incoming logs from filebeat includes escape string . Thus grok filter which does not parse \e should will fail as _grokparsefailure

 "message" => "2019-10-02T15:49:45.307Z - \e[32minfo\e[39m: Activity:content-finish 35692 5110905 config-180611.xml xml/media/config-180611-CL1.xml"

To make sure , can you also check that file , /var/devicelogs/processed/100119/1569342392059/logfile.log, includes below line ?

2019-10-02T04:00:09.254Z - \e[32minfo\e[39m: Activity:content-finish 35692

This will confirm what filebeat is actually reading from the file .

Hi!

Just to add more information, I think I have tackled similar issues in the past; maybe using the notation \x1B as the initial escape sequence is useful for your purpose.

Here is link to the information I compiled for a related issue:

1 Like

@andres-perez this is exactly what I needed! Thank you!

The original log text shows the ANSI colors as ^[[32minfo^[[39m but after digging into the Filebeat debug logs I could see Filebeat was seeing the ANSI block as \u001b[32minfo\u001b[39m but Logstash was seeing (as I mentioned earlier) those blocks as \e[32minfo\e[39m

After adding the mutate gsub line with your provided regex, it worked perfectly -- stripping out those codes.

I did have to add encoding: plain to the Filebeat config.

Thank you, again!