Unable to parse log containing UNICODE characters and ANSI colour codes using grok


(Safiyat) #1

I am trying to parse the following line:

2015-09-17 17:44:49.663 ^[[00;32mDEBUG oslo_concurrency.lockutils [^[[00;36m-^[[00;32m] ^[[01;35m^[[00;32mAcquired semaphore "singleton_lock"^[[00m ^[[00;33mfrom (pid=30534) lock /usr/local/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py:198^[[00m

The character ^[ is actually the ESC key whose octal code is \033, hex code is \x1B.

The sub-string ^[[00;32m and others like that are actually ANSI colour codes, which when printed in a terminal is printed like this.

I need to parse this log line but have been unable to do it. I tried using the color-stripper plugin but it won't work for me.

I am able to parse the log line in plaintext using the pattern:

%{TIMESTAMP_ISO8601:timestamp}%{SPACE}%{LOGLEVEL:loglevel}%{SPACE}{NOTSPACE:api}%{SPACE}\[(?:%{DATA:request})\]%{SPACE}%{GREEDYDATA:message}

How do I parse the coloured log line?
To parse it at the character level, we need to parse the unicode character \u001B. Any alternate way to do it by parsing the unicode character?

The related stackoverflow question can be found here.


(system) #2