Description
We are using Filebeat to collect data from multiple systems (~300 different hosts) and OSes ( Windows, Ubuntu, Mac OS).
We are collecting around 125Million events coming from .ndjson
files per week using Filebeat to upload them to Elasticsearch through Logstash.
The problem is that we are getting sporadic/intermittent decoding errors from Ubuntu hosts like these:
Error decoding JSON: invalid character '_' in literal true (expecting 'r')
Error decoding JSON: invalid character '_' in literal true (expecting 'r')
Error decoding JSON: invalid character 'T' looking for beginning of value
Error decoding JSON: invalid character 'l' looking for beginning of value
Error decoding JSON: invalid character 'a' looking for beginning of value
Error decoding JSON: invalid character 's' in literal null (expecting 'u')
Error decoding JSON: json: cannot unmarshal number into Go value of type map[string]interface {}
Out of the 125M events per week, ~1K events are Filebeat Decoding errors. We have not seen this happening in any other OS, only Ubuntu.
OS breakdown for the 125M events per week:
- Windows ____________ 66%
- Ubuntu ______________ 33%
- Others (like Mac OS) _ 1%
The decoding error does not seem consistent, we have analyzed several .ndjson
files and the files are valid ndjson
files. We have even tried just copying the same file to the same Filebeat input.file.path
to re-harvest it and the problem does not reproduce.
The log.offset
where the decoding error happens is not consistent either, we see offsets at the beginning of the files, almost at the end, in the middle... etc.
Example
Here is an example of the filebeat
log for one instance of this issue (private details have been replaced by <>
):
2020-12-15T13:00:02.885-0600 INFO log/harvester.go:302 Harvester started for file: <path>/harvest/duration-ckiqcovu500009povay7jlkbg.ndjson
2020-12-15T13:00:02.886-0600 INFO log/harvester.go:302 Harvester started for file: <path>/harvest/usage-ckiqcoumv000086ov90tbaujp.ndjson
2020-12-15T13:00:02.886-0600 INFO log/harvester.go:302 Harvester started for file: <path>/harvest/usage-ckiqcothy00006uovasrxxjis.ndjson
2020-12-15T13:00:02.886-0600 INFO log/harvester.go:302 Harvester started for file: <path>/harvest/usage-ckiqcow0u00039pov70wjba7t.ndjson
2020-12-15T13:00:02.887-0600 INFO log/harvester.go:302 Harvester started for file: <path>/harvest/usage-ckiqcosls000066ovi7m1iic6.ndjson
2020-12-15T13:00:02.887-0600 INFO log/harvester.go:302 Harvester started for file: <path>/harvest/usage-ckiqcoqac00002xovjtuxf8gc.ndjson
2020-12-15T13:00:02.887-0600 INFO log/harvester.go:302 Harvester started for file: <path>/harvest/usage-ckiqcorjo00003yov0af7gp2s.ndjson
2020-12-15T13:00:02.887-0600 ERROR [reader_json] readjson/json.go:57 Error decoding JSON: invalid character 'p' looking for beginning of value
2020-12-15T13:00:02.888-0600 INFO log/harvester.go:302 Harvester started for file: <path>/harvest/usage-ckiqcowhu0000c2ovubpqjr9h.ndjson
2020-12-15T13:00:02.888-0600 ERROR [reader_json] readjson/json.go:57 Error decoding JSON: invalid character 'a' in literal true (expecting 'r')
2020-12-15T13:00:02.890-0600 ERROR [reader_json] readjson/json.go:57 Error decoding JSON: json: cannot unmarshal number into Go value of type map[string]interface {}
2020-12-15T13:00:02.892-0600 INFO [publisher_pipeline_output] pipeline/output.go:143 Connecting to backoff(async(tcp://<logstash_host>:5062))
Here is an example of one of the events with error (private details have been replaced by <>
):
timestamp December 15th 2020, 13:00:02.888
@version 1
agent.type filebeat
agent.version 7.10.0
data.error.message Error decoding JSON: invalid character 'a' in literal true (expecting 'r')
data.error.type json
host.architecture x86_64
host.containerized false
host.os.codename focal
host.os.family debian
host.os.kernel 5.4.0-56-generic
host.os.name Ubuntu
host.os.platform ubuntu
host.os.version 20.04.1 LTS (Focal Fossa)
log.file.path <path>/harvest/usage-ckiqcowhu0000c2ovubpqjr9h.ndjson
# log.offset 206
message tallers/linuxdesktop/<value>","<value>"]]},"subroutine":"<value>"}
tags beats_input_codec_plain_applied
Filebeat Details
- Version: 7.10.0 (we saw this in 7.7.0 too)
yml config file (private details have been replaced by <>
):
filebeat.inputs:
- type: log
enabled: true
paths:
- <path>/harvest/*-*.ndjson
backoff: 10s
json.keys_under_root: false
json.add_error_key: true
processors:
- drop_event:
when:
contains:
json.error.message: "Error decoding JSON: EOF"
- rename:
fields:
- from: "json"
to: "data"
- add_fields:
target: host
fields:
metrics:
running_version: 2.0.2
- add_host_metadata: ~
- add_locale: ~
- rename:
fields:
- from: "event.timezone"
to: "host.timezone"
- drop_fields:
fields: [
"agent.id",
"agent.ephemeral_id",
"agent.name",
"agent.hostname",
"ecs",
"host.id",
"host.mac",
"host.hostname",
"host.os.build",
"input"
]
ignore_missing: true
queue.disk:
max_size: 256MiB
retry_interval: 5s
max_retry_interval: 60s
output.logstash:
hosts:
- <list_of_hosts>
loadbalance: true
compression_level: 4