Parsing special charaters with grok

chivas · January 2, 2020, 10:04am

My log file contains lines in below format:
cat -A test3_client1_out-0.log

[2020-01-02 12:38:33.378] ^[[32mINFO ^[[39m: ^[[36m<login> success member<232687>^[[39m$
[2020-01-02 11:55:05.698] ^[[32mINFO ^[[39m: ^[[36m<login> logging in member<200000>^[[39m

So I came up with the following regex:
"\A\[%{TIMESTAMP_ISO8601:datetime}] \^\[\[32mINFO \^\[\[39m: \^\[\[36m<%{WORD:function}> %{CISCO_REASON:status}<%{INT:memId}%{GREEDYDATA:discard}"

However after running logstash, filebeat sends the data to logstash in the below format:
"message" => "[2020-01-02 11:55:06.037] \e[32mINFO \e[39m: \e[36m logging in member<200000>\e[39m"

What is happening in the above case, why is the ^[ changing to \e? Any pointers appreciated.

PS : I even tried a regex for matching the pattern in the message, however this too was being ignored.

Thanks!

andres-perez · January 2, 2020, 12:27pm

These codes are special ANSI sequences used to change the color in terminal output.

In simple words, these marks indicate to compatible console tools:
start block of green characters INFO (end of the block and) return to default color

Both the ^[ and the \e are different ways to encode a escape sequence; a good link to learn about it: http://jafrog.com/2013/11/23/colors-in-terminal.html

I don't know why filebeat uses a different way to represent the same character than your original console logs.

Some considerations about the regex pattern used for matching the special character in grok:

Setting the ^[ escape character is tricky, you can't just write the two characters "caret" and "square bracket".
See Multiline Codec logstash for log level and https://stackoverflow.com/a/33479939
Have you tried \e instead of ^[ (something like \e\[32mINFO ... ) ?
You can also test another way to capture this special character with a regex: \x1B instead of either \e or ^[

Although it is not what you ask for, I will add a solution that works for my use cases; maybe it fits your purpose.

Usually I don't have control about which colors or formats will be used by the application logs in the systems that I work with. Now you are parsing green INFO messsages but ERROR ones will probably use red. Who knows what formats will be used in the future for other information...

When ANSI formatting is used, I prefer to strip the whole ANSI sequences before any other parsing (groks and similar filters). This makes the latter more readable and robust.

mutate {
  id => "[Meaningful label for your project, not repeated in any other config files] remove ANSI color codes"
  gsub => ["message", "\x1B\[([0-9]{1,2}(;[0-9]{1,2})?)?[m|K]", ""]
}

chivas · January 3, 2020, 6:07am

Thanks for this tip, made this change in the code itself for generating the log.

Will keep this in mind for the next time.

Thanks!

system · January 31, 2020, 6:08am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Escape specialcharacters in grok Beats filebeat	3	394	December 19, 2019
Grok expression works, but causes logstash config check to fail. Special characters in log an issue? Logstash	2	529	December 8, 2018
_grokparsefailure on file parsing Beats filebeat	8	1855	November 24, 2016
After Updating Logstash to 7.7 from 7.6 regex isn't working the same Logstash	10	723	June 23, 2020
Special Characters in logs - how to escape them in logstash grok pattern Logstash	4	29065	July 6, 2017

Parsing special charaters with grok

Related topics