Parsing special charaters with grok

My log file contains lines in below format:
cat -A test3_client1_out-0.log

[2020-01-02 12:38:33.378] ^[[32mINFO ^[[39m: ^[[36m<login> success member<232687>^[[39m$
[2020-01-02 11:55:05.698] ^[[32mINFO ^[[39m: ^[[36m<login> logging in member<200000>^[[39m

So I came up with the following regex:
"\A\[%{TIMESTAMP_ISO8601:datetime}] \^\[\[32mINFO \^\[\[39m: \^\[\[36m<%{WORD:function}> %{CISCO_REASON:status}<%{INT:memId}%{GREEDYDATA:discard}"

However after running logstash, filebeat sends the data to logstash in the below format:
"message" => "[2020-01-02 11:55:06.037] \e[32mINFO \e[39m: \e[36m logging in member<200000>\e[39m"

What is happening in the above case, why is the ^[ changing to \e? Any pointers appreciated.

PS : I even tried a regex for matching the pattern in the message, however this too was being ignored.

Thanks!

These codes are special ANSI sequences used to change the color in terminal output.

In simple words, these marks indicate to compatible console tools:
start block of green characters INFO (end of the block and) return to default color

Both the ^[ and the \e are different ways to encode a escape sequence; a good link to learn about it: http://jafrog.com/2013/11/23/colors-in-terminal.html

I don't know why filebeat uses a different way to represent the same character than your original console logs.

Some considerations about the regex pattern used for matching the special character in grok:

  • Setting the ^[ escape character is tricky, you can't just write the two characters "caret" and "square bracket".
    See Multiline Codec logstash for log level and https://stackoverflow.com/a/33479939

  • Have you tried \e instead of ^[ (something like \e\[32mINFO ... ) ?

  • You can also test another way to capture this special character with a regex: \x1B instead of either \e or ^[


Although it is not what you ask for, I will add a solution that works for my use cases; maybe it fits your purpose.

Usually I don't have control about which colors or formats will be used by the application logs in the systems that I work with. Now you are parsing green INFO messsages but ERROR ones will probably use red. Who knows what formats will be used in the future for other information...

When ANSI formatting is used, I prefer to strip the whole ANSI sequences before any other parsing (groks and similar filters). This makes the latter more readable and robust.

mutate {
  id => "[Meaningful label for your project, not repeated in any other config files] remove ANSI color codes"
  gsub => ["message", "\x1B\[([0-9]{1,2}(;[0-9]{1,2})?)?[m|K]", ""]
}

Thanks for this tip, made this change in the code itself for generating the log.

Will keep this in mind for the next time.

Thanks!

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.