Sometimes beginning of log line is missing resulting in _jsonparsefailed tag

Hi,

I am on elastic stack 7.4.2 and using following pipeline:
log -> filebeat -> redis (TLS via stunnel) -> logstash-> elasticsearch

In logstash I use pipeline to pipeline communication, so that different inputs the same elasticsearch output.

My Logfile is consisting of a json log. json decoding takes place in logstash, not in filebeat.

I have following issue:
Sometimes the beginning of a log line is truncated. The rest of the line is not a valid json, so it is tagged as _jsonparsefailed in logstash.

What I found out so far:

  • I am not able to find the beginning of the line as _jsonparsefailed. So the beginning of the line seems to be lost somewhere in the processing pipeline.
  • The affected log lines are not the first line in the file (see offset).

My filebeat config looks like this:

filebeat.yml

filebeat.config:
  inputs:
    enabled: true
    path: inputs.d/*.yml
    reload.enabled: true
    reload.period: 30s
filebeat.config.modules:
  path: ${path.config}/modules.d/*.yml
  reload.enabled: false
setup.template.settings:
  index.number_of_shards: 1
fields:
  {"log": {"sourceGroup": "staging"}}
fields_under_root: true
setup.kibana:
output.redis:
  enabled: true
  hosts: ["poc.redis.k8s-dev.local:16379"]
  key: "%{[logType]:fallback}"
  ssl.enabled: true
  ssl.verification_mode: full
  ssl.supported_protocols: [TLSv1.0, TLSv1.1, TLSv1.2]
  ssl.certificate_authorities: ["/etc/filebeat/ca.crt"]
processors:
  - add_host_metadata: ~
  - add_cloud_metadata: ~

The included inputs.d looks like this:

- type: log
  enabled: true
  paths:
    - /project/jboss/*/*/log/monitoring.json.log*
  encoding: windows-1252
  fields:
    logType: generic-json
    log.format: json
  fields_under_root: true
  max_bytes: 90000000

Questions:

  • In the past I needed to increase max_bytes because I've seen the log.flag: truncated in events.
    Do I need to increase any other parameter which configures internal filebeat queues, etc?
  • How to prove that the error is made in filebeat and not in redis or logstash? Can I simultaneously dump the messages which filebeat sends to redis to a file too? Then I could search for incomplete messages in the file and see if they are correct there or truncated at the same position.
  • Any Idea why I cannot find the beginning of the messgae? In my understanding I sould see the beginning as _jsonparsedfailed tagged message also in kibana, because this also must be an incomplete json.
  • How does filebeat behave if a log line is written in two write operations, because OS or log4j buffer is full? But if that would be the problem that filebeat is splitting a message in this case into 2 or more parts, I should see the first part as incomplete message, correct?
  • Any other ideas how to analyse this error?

Thanks a lot,
Andreas

Does anyone have an idea?

Could you please share the debug logs of Filebeat? Also, if it's not a problem a few lines of logs which are processed incorrectly?

Thanks for your reply. I will change debug level on the server today and will post the log and logline when the error reoccurs.
It is happening only a few times a day.

Great. Thank you.

Hi @kvch,

after setting filebeat logs to debug level I was able to capture the issue.
In summary I would say its a strange thing. I would say we have two bugs, one in our logrotating and one in filebeat.

For me it looks like that filebeat is setting a wrong offset for the file during the log rotating. And our logrotating shows the bug, that it is renaming to an already existing file and overwrites it. Seems that this coincidence is triggering the wrong offset by filebeat.

The issue occurs when filebeat is reading the file /poliks/jboss/test7/instance-simulation-poliks/log/monitoring-root.json.log at offset 944.

Kibana:
image
Log:

So I grepped the filebeat log for offset 944.

2019-11-28T03:28:43.734+0100    DEBUG   [input] log/input.go:511        Update existing file for harvesting: /poliks/jboss/test7/instance-simulation-poliks/log/monitoring-root.json.log.2019-11-26, offset: 944
2019-11-28T03:28:53.754+0100    DEBUG   [input] log/input.go:511        Update existing file for harvesting: /poliks/jboss/test7/instance-simulation-poliks/log/monitoring-root.json.log.2019-11-26, offset: 944
2019-11-28T03:29:03.778+0100    DEBUG   [input] log/input.go:511        Update existing file for harvesting: /poliks/jboss/test7/instance-simulation-poliks/log/monitoring-root.json.log.2019-11-26, offset: 944
2019-11-28T03:29:13.796+0100    DEBUG   [input] log/input.go:511        Update existing file for harvesting: /poliks/jboss/test7/instance-simulation-poliks/log/monitoring-root.json.log.2019-11-26, offset: 944
2019-11-28T03:29:23.812+0100    DEBUG   [input] log/input.go:511        Update existing file for harvesting: /poliks/jboss/test7/instance-simulation-poliks/log/monitoring-root.json.log.2019-11-26, offset: 944
2019-11-28T03:29:33.833+0100    DEBUG   [input] log/input.go:511        Update existing file for harvesting: /poliks/jboss/test7/instance-simulation-poliks/log/monitoring-root.json.log, offset: 944
2019-11-28T03:29:33.833+0100    DEBUG   [input] log/input.go:520        Resuming harvesting of file: /poliks/jboss/test7/instance-simulation-poliks/log/monitoring-root.json.log, offset: 944, new size: 1392
2019-11-28T03:29:33.834+0100    DEBUG   [harvester]     log/harvester.go:399    Update state: /poliks/jboss/test7/instance-simulation-poliks/log/monitoring-root.json.log, offset: 944

You can see, that the offset 944 belongs to the already rotated file with suffix 2019-11-26.

Then at 2019-11-28T03:29:33.833+0100 this offset is used for the current file without suffix.
If I trace the filebeat logs for my current file I can see the following:

2019-11-28T03:29:13.797+0100    DEBUG   [input] log/input.go:511        Update existing file for harvesting: /poliks/jboss/test7/instance-simulation-poliks/log/monitoring-root.json.log, offset: 10309
2019-11-28T03:29:23.813+0100    DEBUG   [input] log/input.go:511        Update existing file for harvesting: /poliks/jboss/test7/instance-simulation-poliks/log/monitoring-root.json.log, offset: 10309
2019-11-28T03:29:33.833+0100    DEBUG   [input] log/input.go:511        Update existing file for harvesting: /poliks/jboss/test7/instance-simulation-poliks/log/monitoring-root.json.log, offset: 944
2019-11-28T03:29:33.833+0100    DEBUG   [input] log/input.go:520        Resuming harvesting of file: /poliks/jboss/test7/instance-simulation-poliks/log/monitoring-root.json.log, offset: 944, new size: 1392
2019-11-28T03:29:33.834+0100    DEBUG   [harvester]     log/harvester.go:399    Update state: /poliks/jboss/test7/instance-simulation-poliks/log/monitoring-root.json.log, offset: 944
2019-11-28T03:29:33.834+0100    DEBUG   [input] log/input.go:544        File rename was detected: /poliks/jboss/test7/instance-simulation-poliks/log/monitoring-root.json.log -> /poliks/jboss/test7/instance-simulation-poliks/log/monitoring-root.json.log.2019-11-26, Current offset: 10309
2019-11-28T03:29:33.834+0100    DEBUG   [input] log/input.go:547        Updating state for renamed file: /poliks/jboss/test7/instance-simulation-poliks/log/monitoring-root.json.log -> /poliks/jboss/test7/instance-simulation-poliks/log/monitoring-root.json.log.2019-11-26, Current offset: 10309
2019-11-28T03:29:43.852+0100    DEBUG   [input] log/input.go:511        Update existing file for harvesting: /poliks/jboss/test7/instance-simulation-poliks/log/monitoring-root.json.log, offset: 1392
2019-11-28T03:29:53.872+0100    DEBUG   [input] log/input.go:511        Update existing file for harvesting: /poliks/jboss/test7/instance-simulation-poliks/log/monitoring-root.json.log, offset: 1392
2019-11-28T03:30:03.892+0100    DEBUG   [input] log/input.go:511        Update existing file for harvesting: /poliks/jboss/test7/instance-simulation-poliks/log/monitoring-root.json.log, offset: 1392

The offset of current file was bigger before being set to 944.
One millicsec later filebeat is also detecting that a renaming from current file to the already existing file and overwrites it.
I would say filebeat is handling the renaming wrong, if renaming will overwrite / replace an existing file.

So I think you have to fix this behavior of filebeat and we have to fix our log rotating for our load generating tool.

Would you confirm my investigation?

Thanks,
Andreas

Indeed it looks like a logrotation problem. What do you use to rotate logs?
There is a guide for configuring logrotation for Filebeat, so it can read all logs properly: https://www.elastic.co/guide/en/beats/filebeat/current/file-log-rotation.html

Let me know if that helps.

In this application we are using log4j 1.2.14 with .DailyRollingFileAppender.
FileMask is set to monitoring-root.json.log* as you can see in thread opening.

I would say the recommendation of the manual is fulfilled. Also filebeat is detecting a renaming and not a truncation.

Any further ideas?