I'm using beats, logstash, elasticsearch to read log info in json, add some fields and index it. Beats is set up to treat a single line in the logs as an event, and each json object should be on a single line. I noticed that there were some documents stored in elasticsearch that didn't have the right info, and after investigating found documents that looked like this...
{
"_index": "testresults",
"_type": "doc",
"_id": "ziVky2YBWK0nRFtEhD4U",
"_score": 1,
"_source": {
"tags": [
"_jsonparsefailure",
"_elasticsearch_lookup_failure"
],
"runNumber": 1,
"firstFail": null,
"message": """.4651157-04:00","jobID":"PL622"}"""
}
}
This is from logstash basically saying that the message sent to it from filebeats is not valid json, which it isn't. At this point, I assumed that the logs had carriage returns in the middle of the json object, so I turned on debug logging in beats and started watching for a bad message. And I found one...
2018-10-31T15:15:36.979-0400 DEBUG [publish] pipeline/processor.go:275 Publish event: {
"@timestamp": "2018-10-31T19:15:36.979Z",
"@metadata": {
"beat": "filebeat",
"type": "doc",
"version": "6.2.4"
},
"offset": 10316,
"message": "}",
"source": "d:\autotestdata\filebeat\PL622_tsrx-29.json",
"prospector": {
"type": "log"
},
"beat": {
"name": "jarvis",
"hostname": "jarvis",
"version": "6.2.4"
}
}
In this case the entire json except for the last "}" has been cut off. So I looked at the contents of PL622_tsrx-29.json. But I found that the entire json object was on a single line. So I copied that line to the second line in the file and saved it. Beats saw that the file had changed and behaved correctly. So I conclude that there is nothing wrong with the logs or how I've set up beats. For some reason, beats seems to occasionally mangle the message. Here's how I have beats configured...
filebeat.prospectors:
- type: log
enabled: true
paths:- d:\autotestdata\filebeat\*.json
filebeat.config.modules:
path: C:\Program Files\Filebeat>*.yml
reload.enabled: false
output.logstash:
hosts: ["localhost:5044"]
logging.level: debug
I've looked at my data in elasticsearch, and it looks like about 1% of my log info is getting dropped due to this problem. Given the problem's indeterminate nature, I don't believe I can provide a test case that will reliably reproduce the problem. Is this a known issue? Can someone provide some guidance on how I can work around this problem?
Thanks,
Chris