This is the original entry in the log file:
2024-11-25 23:14:27,671 INFO o.s.w.s.c.WebSocketMessageBrokerStats [MessageBroker-1] WebSocketSession[0 current WS(0)-HttpStream(0)-HttpPoll(0), 0 total, 0 closed abnormally (0 connect failure, 0 send limit, 0 transport error)], stompSubProtocol[processed CONNECT(0)-CONNECTED(0)-DISCONNECT(0)], stompBrokerRelay[null], inboundChannel[pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0], outboundChannel[pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0], sockJsScheduler[pool size = 1, active threads = 1, queued tasks = 0, completed tasks = 197]
I copied this file to my Windows machine and opened it in Notepad. The bottom of the Notepad window says the file is Unix (LF) UTF-8.
This is the same log record as shown in Kibana/Elasticsearch:
\u001b[30m2024-11-25 22:44:27,670\u001b[0;39m \u001b[34mINFO \u001b[0;39m [\u001b[34mMessageBroker-1\u001b[0;39m] \u001b[33mo.s.w.s.c.WebSocketMessageBrokerStats\u001b[0;39m: WebSocketSession[0 current WS(0)-HttpStream(0)-HttpPoll(0), 0 total, 0 closed abnormally (0 connect failure, 0 send limit, 0 transport error)], stompSubProtocol[processed CONNECT(0)-CONNECTED(0)-DISCONNECT(0)], stompBrokerRelay[null], inboundChannel[pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0], outboundChannel[pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0], sockJsScheduler[pool size = 1, active threads = 1, queued tasks = 0, completed tasks = 196]\n
So somewhere in the process of Filebeat parsing the log file and shipping its content to Elasticsearch, to displaying the content in Kibana, something is inserting extra characters like \u001b[30m
and \u001b[34m
.
The applicable settings in our filebeat-kubernetes.yaml file
- type: filestream
id: ceo-api-dev1-container-logs
paths:
- /var/log/containers/ceo-api-*.log
encoding: utf-8
fields_under_root: true
fields:
data_stream.type: logs
data_stream.dataset: ceo
data_stream.namespace: api
app_id: ceo-api-dev1
parsers:
- container: ~
prospector:
scanner:
fingerprint.enabled: true
symlinks: true
file_identity.fingerprint: ~
processors:
- add_kubernetes_metadata:
host: ${NODE_NAME}
namespace: ceo-dev1
matchers:
- logs_path:
logs_path: "/var/log/containers/"
According to the Filebeat docs, the encoding: utf-8
setting should have told Filebeat to parse the log file as utf-8 characters.
Is there a way to prevent these extra characters from being added or are we stuck hacking with a Grok processor or adding Logstash to our setup to get rid of those extra characters?