Filebeat on Ubuntu hosts getting sporadic error: "Error decoding JSON:"

Description

We are using Filebeat to collect data from multiple systems (~300 different hosts) and OSes ( Windows, Ubuntu, Mac OS).

We are collecting around 125Million events coming from .ndjson files per week using Filebeat to upload them to Elasticsearch through Logstash.

The problem is that we are getting sporadic/intermittent decoding errors from Ubuntu hosts like these:

Error decoding JSON: invalid character '_' in literal true (expecting 'r')
Error decoding JSON: invalid character '_' in literal true (expecting 'r')
Error decoding JSON: invalid character 'T' looking for beginning of value
Error decoding JSON: invalid character 'l' looking for beginning of value
Error decoding JSON: invalid character 'a' looking for beginning of value
Error decoding JSON: invalid character 's' in literal null (expecting 'u')
Error decoding JSON: json: cannot unmarshal number into Go value of type map[string]interface {}

Out of the 125M events per week, ~1K events are Filebeat Decoding errors. We have not seen this happening in any other OS, only Ubuntu.

OS breakdown for the 125M events per week:

  • Windows ____________ 66%
  • Ubuntu ______________ 33%
  • Others (like Mac OS) _ 1%

The decoding error does not seem consistent, we have analyzed several .ndjson files and the files are valid ndjson files. We have even tried just copying the same file to the same Filebeat input.file.path to re-harvest it and the problem does not reproduce.

The log.offset where the decoding error happens is not consistent either, we see offsets at the beginning of the files, almost at the end, in the middle... etc.

Example

Here is an example of the filebeat log for one instance of this issue (private details have been replaced by <>):

2020-12-15T13:00:02.885-0600    INFO    log/harvester.go:302    Harvester started for file: <path>/harvest/duration-ckiqcovu500009povay7jlkbg.ndjson
2020-12-15T13:00:02.886-0600    INFO    log/harvester.go:302    Harvester started for file: <path>/harvest/usage-ckiqcoumv000086ov90tbaujp.ndjson
2020-12-15T13:00:02.886-0600    INFO    log/harvester.go:302    Harvester started for file: <path>/harvest/usage-ckiqcothy00006uovasrxxjis.ndjson
2020-12-15T13:00:02.886-0600    INFO    log/harvester.go:302    Harvester started for file: <path>/harvest/usage-ckiqcow0u00039pov70wjba7t.ndjson
2020-12-15T13:00:02.887-0600    INFO    log/harvester.go:302    Harvester started for file: <path>/harvest/usage-ckiqcosls000066ovi7m1iic6.ndjson
2020-12-15T13:00:02.887-0600    INFO    log/harvester.go:302    Harvester started for file: <path>/harvest/usage-ckiqcoqac00002xovjtuxf8gc.ndjson
2020-12-15T13:00:02.887-0600    INFO    log/harvester.go:302    Harvester started for file: <path>/harvest/usage-ckiqcorjo00003yov0af7gp2s.ndjson
2020-12-15T13:00:02.887-0600    ERROR   [reader_json]   readjson/json.go:57 Error decoding JSON: invalid character 'p' looking for beginning of value
2020-12-15T13:00:02.888-0600    INFO    log/harvester.go:302    Harvester started for file: <path>/harvest/usage-ckiqcowhu0000c2ovubpqjr9h.ndjson
2020-12-15T13:00:02.888-0600    ERROR   [reader_json]   readjson/json.go:57 Error decoding JSON: invalid character 'a' in literal true (expecting 'r')
2020-12-15T13:00:02.890-0600    ERROR   [reader_json]   readjson/json.go:57 Error decoding JSON: json: cannot unmarshal number into Go value of type map[string]interface {}
2020-12-15T13:00:02.892-0600    INFO    [publisher_pipeline_output] pipeline/output.go:143  Connecting to backoff(async(tcp://<logstash_host>:5062))

Here is an example of one of the events with error (private details have been replaced by <>):

timestamp               December 15th 2020, 13:00:02.888
@version                1
agent.type              filebeat
agent.version           7.10.0
data.error.message      Error decoding JSON: invalid character 'a' in literal true (expecting 'r')
data.error.type         json
host.architecture       x86_64
host.containerized      false
host.os.codename        focal
host.os.family          debian
host.os.kernel          5.4.0-56-generic
host.os.name            Ubuntu
host.os.platform        ubuntu
host.os.version         20.04.1 LTS (Focal Fossa)
log.file.path           <path>/harvest/usage-ckiqcowhu0000c2ovubpqjr9h.ndjson
# log.offset            206
message                 tallers/linuxdesktop/<value>","<value>"]]},"subroutine":"<value>"}
tags                    beats_input_codec_plain_applied

Filebeat Details

  • Version: 7.10.0 (we saw this in 7.7.0 too)

yml config file (private details have been replaced by <>):

filebeat.inputs:
  - type: log
    enabled: true
    paths:
      - <path>/harvest/*-*.ndjson
    backoff: 10s
    json.keys_under_root: false
    json.add_error_key: true

processors:
  - drop_event:
      when:
        contains:
          json.error.message: "Error decoding JSON: EOF"
  - rename:
      fields:
        - from: "json"
          to: "data"
  - add_fields:
      target: host
      fields:
        metrics:
          running_version: 2.0.2
  - add_host_metadata: ~
  - add_locale: ~
  - rename:
      fields:
        - from: "event.timezone"
          to: "host.timezone"
  - drop_fields:
      fields: [
          "agent.id",
          "agent.ephemeral_id",
          "agent.name",
          "agent.hostname",
          "ecs",
          "host.id",
          "host.mac",
          "host.hostname",
          "host.os.build",
          "input"
      ]
      ignore_missing: true

queue.disk:
  max_size: 256MiB
  retry_interval: 5s
  max_retry_interval: 60s

output.logstash:
  hosts:
    - <list_of_hosts>
  loadbalance: true
  compression_level: 4

Hi! Adding json.ignore_decoding_error: true to your configuration will get rid of the errors for a workaround for now. Do you know if there's any way we can reproduce this error locally?

Hi @Kaiyan_Sheng

Right now we are redirecting these errors to a different Elasticsearch index as a workaround.
Unfortunately, we do not have a way of reproducing this error locally.

The only way to reproduce it is to programmatically generate ndjson files and wait.

Also, something important to mention: when writing to the .ndjson files, we are using "line buffering" so that we flush every new line.

I found an old GitHub issue that sounds similar to what we are experiencing but it was not resolved:

filebeat container multiline error Error decoding JSON: invalid character #20053

I wrote a simple script that writes a random number of json lines (between 0 to 2K lines) to ndjson files.
I left it running and after 250K events I got an error:

Error decoding JSON: json: cannot unmarshal number into Go value of type map[string]interface {}

Event:

message:	2T0vJinis", "field_c": "lq9o6qb6E0"}
error.message: Error decoding JSON: json: cannot unmarshal number into Go value of type map[string]interface {}

Example of one json line
{"field_a": "LxffQBkR9bdsY79L1YsLRbDL", "field_b": "QmhyElp0vZMw5C0uFyGD31h", "field_c": "96CHAj8za"}

I am filling field_a, b, and with random strings (up to 50 chars long)

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

ping.

So this problem has been solved?

1 Like

This problem has not been solved, we still get around 2K errors like this one a week.

Good day, we are experiencing a similar problem although not related to .ndjson files. Just wanted to know if there is any way to log the json being processed at the point of the error? It is likely that there's something we're doing wrong on our end but the error message just doesn't help. Many thanks in advance!

1 Like