Hi, I was trying to set up shipping of a JSON log to Elasticsearch via Filebeat and received this strange error. Now it won't go away. The error is:
{
"level":"error",
"timestamp":"2018-11-15T05:30:08.926Z",
"caller":"elasticsearch/client.go:317",
"message":"Failed to perform any bulk index operations:
500 Server Error: {
\"error\":{
\"root_cause\": [
{\"type\":\"json_parse_exception\",\"reason\":\"Illegal character ((CTRL-CHAR, code 31)): only regular white space (\\\\r, \\\\n, \\\\t) is allowed between tokens\\n at [Source: org.elasticsearch.common.bytes.BytesReference$MarkSupportingStreamInputWrapper@37dd17fb; line: 1, column: 2]\"}
],
\"type\":\"json_parse_exception\",
\"reason\":\"Illegal character ((CTRL-CHAR, code 31)): only regular white space (\\\\r, \\\\n, \\\\t) is allowed between tokens\\n at [Source: org.elasticsearch.common.bytes.BytesReference$MarkSupportingStreamInputWrapper@37dd17fb; line: 1, column: 2]\"},
\"status\":500}"
}
The config I was trying to add to filebeat.yml was:
I found it really hard to debug this error. It seems to be related to bad input, but it remained even when I removed the registry file and the new input. Eventually I restarted Filebeat with only one input (/var/log/filebeat/filebeat), and the error went away.
So the error is gone now, but how can I debug this in future? It seems like one log line was invalid somehow, but I couldn't see the source.
PS, I'm using filebeat 6.4 with elasticsearch 6.3.
We should to to escape this control code when encoding to JSON. Can you please open a bug report?
Is your input json file utf-8 or utf-16 encoded?
Do you get the same error if you disable json parsing, but try to index the file as is?
Using exclude_lines: ["\u001f"] plus use the json decoding processor instead of json config in the input, you should be able to filter out events having control character 31.
Thanks. I'd rather not filter out these lines because the system is meant to be an audit trail. If they could be dealt with by logging an error event instead that would be acceptable at least.
Can you explain more how to do this? I can't figure out how to define our log format using json decoding processor. It doesn't seem to support loading the entire log line, but only a subset of it?
PS, I suspect the reason for the bad data is that the log lines are being generated by Apache using a mangled LogFormat:
We want to include more fields than the default apache2 combined log, so we can't use the apache2 module. But if you have another approach I would be happy to hear it.
That's weird. Unfortunately the control character is 'silent', like: it's unlikely to see it in the debug output. I wonder if the special character actually comes from the original file or if it's due to some of the other custom fields.
It's great that you have such a simplified test case. Can you get a pcap file with tcpdump? This allows us to examine the actual raw bytes being send on a network level.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.