Filebeat inserting extra Unicode characters

paolovalladolid · November 25, 2024, 11:33pm

This is the original entry in the log file:

2024-11-25 23:14:27,671 INFO o.s.w.s.c.WebSocketMessageBrokerStats [MessageBroker-1] WebSocketSession[0 current WS(0)-HttpStream(0)-HttpPoll(0), 0 total, 0 closed abnormally (0 connect failure, 0 send limit, 0 transport error)], stompSubProtocol[processed CONNECT(0)-CONNECTED(0)-DISCONNECT(0)], stompBrokerRelay[null], inboundChannel[pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0], outboundChannel[pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0], sockJsScheduler[pool size = 1, active threads = 1, queued tasks = 0, completed tasks = 197]

I copied this file to my Windows machine and opened it in Notepad. The bottom of the Notepad window says the file is Unix (LF) UTF-8.

This is the same log record as shown in Kibana/Elasticsearch:

\u001b[30m2024-11-25 22:44:27,670\u001b[0;39m \u001b[34mINFO \u001b[0;39m [\u001b[34mMessageBroker-1\u001b[0;39m] \u001b[33mo.s.w.s.c.WebSocketMessageBrokerStats\u001b[0;39m: WebSocketSession[0 current WS(0)-HttpStream(0)-HttpPoll(0), 0 total, 0 closed abnormally (0 connect failure, 0 send limit, 0 transport error)], stompSubProtocol[processed CONNECT(0)-CONNECTED(0)-DISCONNECT(0)], stompBrokerRelay[null], inboundChannel[pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0], outboundChannel[pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0], sockJsScheduler[pool size = 1, active threads = 1, queued tasks = 0, completed tasks = 196]\n

So somewhere in the process of Filebeat parsing the log file and shipping its content to Elasticsearch, to displaying the content in Kibana, something is inserting extra characters like \u001b[30m and \u001b[34m.

The applicable settings in our filebeat-kubernetes.yaml file

    - type: filestream
      id: ceo-api-dev1-container-logs
      paths:
        - /var/log/containers/ceo-api-*.log
      encoding: utf-8
      fields_under_root: true
      fields:
        data_stream.type: logs
        data_stream.dataset: ceo
        data_stream.namespace: api
        app_id: ceo-api-dev1
      parsers:
        - container: ~
      prospector:
        scanner:
          fingerprint.enabled: true
          symlinks: true
      file_identity.fingerprint: ~
      processors:
        - add_kubernetes_metadata:
            host: ${NODE_NAME}
            namespace: ceo-dev1
            matchers:
            - logs_path:
                logs_path: "/var/log/containers/"

According to the Filebeat docs, the encoding: utf-8 setting should have told Filebeat to parse the log file as utf-8 characters.

Is there a way to prevent these extra characters from being added or are we stuck hacking with a Grok processor or adding Logstash to our setup to get rid of those extra characters?

leandrojmp · November 26, 2024, 3:52am

Are you sure this isn't from the source?

What you shared are ANSI escape codes for colors, does your application generate logs in colors? In which language is it written? Some logging frameworks add colors by default.

Testing with an echo -e this is the result:

I don't think that there is anything in the stack that would add those characters in that way.

paolovalladolid · November 26, 2024, 10:18pm

Thanks!

It turns out the application had a logback-spring.xml file that was configured to use PatternLayout. The PatternLayout config had color strings which were then encoded as ANSI escape sequences.

Deleting the color strings solved the issue.

Topic		Replies	Views
Filebeat - received an event - has different character encoding Beats	21	24778	July 5, 2017
Filebeat character encoding problem! Beats filebeat	10	3511	July 17, 2018
Help with Filebeat for Windows Beats filebeat	6	2997	October 24, 2016
National charset convertation Logstash	3	793	July 6, 2017
LogStash encoding Issue from Filebeat IIS Access Logs 7.4.0 Stack Logstash	6	846	November 13, 2019

Filebeat inserting extra Unicode characters

Related topics