Json_parse_exception Illegal character CTRL-CHAR

AlexJ · November 15, 2018, 6:09am

Hi, I was trying to set up shipping of a JSON log to Elasticsearch via Filebeat and received this strange error. Now it won't go away. The error is:

{
"level":"error",
"timestamp":"2018-11-15T05:30:08.926Z",
"caller":"elasticsearch/client.go:317",
"message":"Failed to perform any bulk index operations:
  500 Server Error: {
    \"error\":{
      \"root_cause\": [
        {\"type\":\"json_parse_exception\",\"reason\":\"Illegal character ((CTRL-CHAR, code 31)): only regular white space (\\\\r, \\\\n, \\\\t) is allowed between tokens\\n at [Source: org.elasticsearch.common.bytes.BytesReference$MarkSupportingStreamInputWrapper@37dd17fb; line: 1, column: 2]\"}
      ],
    \"type\":\"json_parse_exception\",
    \"reason\":\"Illegal character ((CTRL-CHAR, code 31)): only regular white space (\\\\r, \\\\n, \\\\t) is allowed between tokens\\n at [Source: org.elasticsearch.common.bytes.BytesReference$MarkSupportingStreamInputWrapper@37dd17fb; line: 1, column: 2]\"},
    \"status\":500}"
}

The config I was trying to add to filebeat.yml was:

filebeat.inputs:
  - type: log
    enabled: false
    paths:
      - /var/log/httpd/mylog.json
    json:
      keys_under_root: true
      add_error_key: true

I found it really hard to debug this error. It seems to be related to bad input, but it remained even when I removed the registry file and the new input. Eventually I restarted Filebeat with only one input (/var/log/filebeat/filebeat), and the error went away.

So the error is gone now, but how can I debug this in future? It seems like one log line was invalid somehow, but I couldn't see the source.

PS, I'm using filebeat 6.4 with elasticsearch 6.3.

steffens · November 15, 2018, 3:55pm

We should to to escape this control code when encoding to JSON. Can you please open a bug report?

Is your input json file utf-8 or utf-16 encoded?

Do you get the same error if you disable json parsing, but try to index the file as is?

Using exclude_lines: ["\u001f"] plus use the json decoding processor instead of json config in the input, you should be able to filter out events having control character 31.

AlexJ · November 15, 2018, 11:15pm

Thanks. I'd rather not filter out these lines because the system is meant to be an audit trail. If they could be dealt with by logging an error event instead that would be acceptable at least.

I've opened a bug: https://github.com/elastic/beats/issues/9120

Thank you for the response!

AlexJ · November 16, 2018, 12:10am

Can you explain more how to do this? I can't figure out how to define our log format using json decoding processor. It doesn't seem to support loading the entire log line, but only a subset of it?

PS, I suspect the reason for the bad data is that the log lines are being generated by Apache using a mangled LogFormat:

LogFormat "{ \"clientip\": \"%a\", \"duration_ms\": %{ms}T, ...}" access_json

{ "timestamp": "2018-11-16T00:04:20.933", "clientip": "1.2.3.4", "duration_ms": 97, "status": 200, "request": "/", "request-path": "/", "request-query": "", "method": "GET", "protocol": "HTTP/1.1", "response_bytes": 9804, "headers": { "referer": "-", "user-agent": "ELB-HealthChecker/2.0", "authorization": "-", "x-amzn-trace-id": "-" } }

We want to include more fields than the default apache2 combined log, so we can't use the apache2 module. But if you have another approach I would be happy to hear it.

AlexJ · November 16, 2018, 6:14am

I managed to replicate this crash. I'm not sure what happens, but the crash happens with an input configuration that is not JSON.

  - type: log
    enabled: true
    paths:
      - /fake

The full debug output from filebeats is here: https://gist.github.com/alexjurkiewicz/da219f1bb1d191db455e983d3a7a961a

I tried twice, and had the same issues both times.

I added a single line to this file with echo hi >>/fake and then started Filebeat, when it immediately failed with this error.

I am really confused now

steffens · November 16, 2018, 1:44pm

That's weird. Unfortunately the control character is 'silent', like: it's unlikely to see it in the debug output. I wonder if the special character actually comes from the original file or if it's due to some of the other custom fields.

It's great that you have such a simplified test case. Can you get a pcap file with tcpdump? This allows us to examine the actual raw bytes being send on a network level.

AlexJ · November 19, 2018, 12:13am

update: disabling compression for my elasticsearch output seems to have made this issue go away

system · December 17, 2018, 12:13am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
I am getting "json_parse_exception" Elasticsearch	5	6341	March 15, 2018
Json_parse_exception Logstash	1	1267	February 13, 2018
Only in AWS Lambda: Json_parse_exception Illegal character CTRL-CHAR code 31 Elasticsearch	1	856	May 15, 2020
ParserError: Illegal character ((CTRL-CHAR, code 27)): only regular white space (\r, \n, \t) is allowed between tokens Logstash	2	3490	March 16, 2020
Org.elasticsearch.common.jackson.JsonParseException: Illegal unquoted character ((CTRL-CHAR, Elasticsearch	1	1153	July 6, 2017

Json_parse_exception Illegal character CTRL-CHAR

Related topics