So I came up with the following regex: "\A\[%{TIMESTAMP_ISO8601:datetime}] \^\[\[32mINFO \^\[\[39m: \^\[\[36m<%{WORD:function}> %{CISCO_REASON:status}<%{INT:memId}%{GREEDYDATA:discard}"
However after running logstash, filebeat sends the data to logstash in the below format: "message" => "[2020-01-02 11:55:06.037] \e[32mINFO \e[39m: \e[36m logging in member<200000>\e[39m"
What is happening in the above case, why is the ^[ changing to \e? Any pointers appreciated.
PS : I even tried a regex for matching the pattern in the message, however this too was being ignored.
Have you tried \e instead of ^[ (something like \e\[32mINFO ... ) ?
You can also test another way to capture this special character with a regex: \x1B instead of either \e or ^[
Although it is not what you ask for, I will add a solution that works for my use cases; maybe it fits your purpose.
Usually I don't have control about which colors or formats will be used by the application logs in the systems that I work with. Now you are parsing green INFO messsages but ERROR ones will probably use red. Who knows what formats will be used in the future for other information...
When ANSI formatting is used, I prefer to strip the whole ANSI sequences before any other parsing (groks and similar filters). This makes the latter more readable and robust.
mutate {
id => "[Meaningful label for your project, not repeated in any other config files] remove ANSI color codes"
gsub => ["message", "\x1B\[([0-9]{1,2}(;[0-9]{1,2})?)?[m|K]", ""]
}
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.