Filebeats cuts off first part of log event

hyattcs · November 1, 2018, 5:45pm

I'm using beats, logstash, elasticsearch to read log info in json, add some fields and index it. Beats is set up to treat a single line in the logs as an event, and each json object should be on a single line. I noticed that there were some documents stored in elasticsearch that didn't have the right info, and after investigating found documents that looked like this...

  {
    "_index": "testresults",
    "_type": "doc",
    "_id": "ziVky2YBWK0nRFtEhD4U",
    "_score": 1,
    "_source": {
      "tags": [
        "_jsonparsefailure",
        "_elasticsearch_lookup_failure"
      ],
      "runNumber": 1,
      "firstFail": null,
      "message": """.4651157-04:00","jobID":"PL622"}"""
    }
  }

This is from logstash basically saying that the message sent to it from filebeats is not valid json, which it isn't. At this point, I assumed that the logs had carriage returns in the middle of the json object, so I turned on debug logging in beats and started watching for a bad message. And I found one...

2018-10-31T15:15:36.979-0400 DEBUG [publish] pipeline/processor.go:275 Publish event: {
"@timestamp": "2018-10-31T19:15:36.979Z",
"@metadata": {
"beat": "filebeat",
"type": "doc",
"version": "6.2.4"
},
"offset": 10316,
"message": "}",
"source": "d:\autotestdata\filebeat\PL622_tsrx-29.json",
"prospector": {
"type": "log"
},
"beat": {
"name": "jarvis",
"hostname": "jarvis",
"version": "6.2.4"
}
}

In this case the entire json except for the last "}" has been cut off. So I looked at the contents of PL622_tsrx-29.json. But I found that the entire json object was on a single line. So I copied that line to the second line in the file and saved it. Beats saw that the file had changed and behaved correctly. So I conclude that there is nothing wrong with the logs or how I've set up beats. For some reason, beats seems to occasionally mangle the message. Here's how I have beats configured...

filebeat.prospectors:

type: log
enabled: true
paths:
- d:\autotestdata\filebeat\*.json

filebeat.config.modules:
path: C:\Program Files\Filebeat>*.yml
reload.enabled: false

output.logstash:
hosts: ["localhost:5044"]

logging.level: debug

I've looked at my data in elasticsearch, and it looks like about 1% of my log info is getting dropped due to this problem. Given the problem's indeterminate nature, I don't believe I can provide a test case that will reliably reproduce the problem. Is this a known issue? Can someone provide some guidance on how I can work around this problem?

Thanks,
Chris

jsoriano · November 2, 2018, 12:05pm

Hi @hyattcs and welcome

If the log files contain one JSON object per line they can be directly parsed with filebeat, take a look to the json options.

In any case it is weird to see that it is collecting incomplete lines. How are you writing to these files? Is it possible that it is truncating the file from time to time?

hyattcs · November 2, 2018, 2:23pm

Hi Jaime, thanks for getting back. I didn't know about the json parsing options in filebeat. I'll look in to that.

There's a separate process running on another machine that's writing the files. The file is not being truncated. I've traced several of the failures back to their origin file, and the data in the file is complete.

I've done an internet search, and it looks like this is not an isolated incident. Others have seen the same problem...

hyattcs · November 5, 2018, 6:26pm

I've been thinking about the possibility of a race condition between the file being written and filebeat reading the file. Is there a way in the filebeat configuration to set up delay between the file change detection and the initial file read? I just went through the prospector setup, and I didn't see anything obvious.

hyattcs · November 14, 2018, 8:34pm

I've been able to determine that this is not a problem with filebeat. Turns out we were creating multiple log files with the same name. The first file would get saved without issue. Then the second file would override the first. If the second was bigger, filebeat would start where it left off with the first, which has the effect of cutting off the first half of the second file.

system · December 12, 2018, 10:34pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
One line JSON parsing into separate events/logs Beats filebeat	2	423	August 15, 2019
Filebeat is putting whole JSON object into one field Beats filebeat	2	1242	April 28, 2020
How does filebeat (or logstash) converts single line of log to json data? Beats	3	1997	June 8, 2017
Problem with JSON logs Beats filebeat	3	561	March 9, 2019
Filebeat truncates output data Beats filebeat	8	1976	January 2, 2020

Filebeats cuts off first part of log event

Related topics