Logstash json parse error

I have a problem trying to parse a log file with backslashes in it. I have tried a number of escape options using the gsub() method although without any luck. I am not a ruby coder so its possible i am doing something simple but wrong.

[2017-02-06T11:21:45,716][ERROR][logstash.codecs.json     ] JSON parse error, original data now in message field {:error=>#<LogStash::Json::ParserError: Unrecognized character escape 'P' (code 80)
; line: 1, column: 183]>, :data=>"{ \"time\": \"2017-02-02T20:12:22.583Z\", \"msg\": \"PP file system location: C:\\Pr\\pc\\au\\Pc0\\\" }\r"}

My logstash script looks like this

filter {
if ["msg"] {
ruby {
code => 'str=event.get("msg"); str=str.gsub("\x22",""").gsub("\x5C", "\"); event.set("msg", str);'
}
}
}

Thanks
Matt

Have you run in through the Grok debugger?

https://grokdebug.herokuapp.com/

Is that applicable when its a JSON input ?

Ohhhh good question. I am not sure. :frowning:

I suppose fixing the broken JSON is out of the question then?

Hi @magnusbaeck I just checked the JSON but still can't spot a problem. I have edited it as the actual JSON has some paths and lots of other properties but the JSON above showed the issue. Anyway as I said I still can't see the problem with the JSON?

For some more info if required below is my conf file with the full inputs/filter/output

input {
file {
path => "C:/temp/test.log"
start_position => "beginning"
sincedb_path => "c:\Elastic\spots.txt"
#ignore_older => 100000000
codec => json {
#charset => "ISO-8859-1"
}
#add_field => { "pid" => 0 }
}
}

filter {
date {
match => [ "time", "ISO8601", "YYYY-MM-dd HH:mm:ss", "YYYY-MM-dd HH:mm:ss.SSSS", "YYYY/MM/dd HH:mm:ss", "YYYY/MM/dd HH:mm:ss.SSSS", "YYYY-MM-dd'T'HH:mm:ss:SSSS'Z'" ]
timezone => UTC
}
mutate {
remove_field => [ "time" ]
}
if ["msg"] {
ruby {
code => 'str=event.get("msg"); str=str.gsub("\x22",""").gsub("\x5C", "\"); event.set("msg", str);'
}
}
}

output {
elasticsearch {
hosts => [ "localhost:9200" ]
index => "logstash-209.91-%{+YYYY.MM.dd}"
codec => "json"
template => "C:/Elastic/logstash-conf-files/logstash.template.json"
template_overwrite => true
}
}

Without the template file I get errors due to not being able to modify the mapping after the fact

What/How is generating the JSON?

The logs are generated by a third party app in json format.

@mattgolding

Do you know if the JSON was generated by a JSON parser/generator or crafted with concatenated strings?

Try taking a raw JSON file and giving it to http://codebeautify.org/jsonvalidator using the file browser feature.

Often in error messages or the console what you see is a representation of the JSON being fed into the LS JSON parser.

Please post your config with sensitive info removed.

Okay so I now may be on the same wavelength as everyone that has replied so far, sorry it took so long.

I have found one of the problem lines of JSON in the intput file which looks like this

{ "time": "2017-02-02T20:12:22.583Z", "msg": "PCCIS file system location: C:\Prizm\pccis\auto_instances\Pccis0" }

So my previous thinking was this was going all the way through to the elastic output and then the error was being thrown. I am now thinking (I believe the same way everyone else is) that the problem is infact during the input process. When the line of JSON is converted into a JSON an error is thrown. The json validator shows this as non valid JSON as well.

So this brings me to my next question. I have no way of changing the logs as they come from an external source. Is there a way to replace single \ with double \ before the input tries to convert it to JSON? I assume this would fix the issue. Running a replace over the file would loose the benefit of logstash running in the background and bringing the logs into elastic in real time.

So messing around with this a little when I change the output to

output {
    stdout {
        codec => "json"
    }
}

The terminal shows this

{
      "path":"/Users/mattgolding/OneDrive/temp/209.91-prizm-logs/Pccis0/test.log",
      "@timestamp":"2017-02-09T05:52:05.208Z",
      "@version":"1",
      "host":"Caretakers-MacBook-Pro.local",
      "message":"{ \"time\": \"2017-02-02T20:12:22.583Z\", \"msg\": \"PCC file system location: C:\\Prizm\\pccis\\auto_instances\\Pccis0\\\" }\r",
      "tags":["_jsonparsefailure"]
}

Changing that back to the following output

output {
    elasticsearch {
        hosts => [ "localhost:9200" ]
        index => "logstash-209.91-%{+YYYY.MM.dd}"
        codec => "json"
    }
}

Now the document is going into elastic but obviously it would be awesome if there was a way to get the JSON that is in the 'message' property to the JSON object that goes into elastic? Is there a way to do this?

The problem with trying to use gsub is that you must add a second backslash only when there is a single backslash.
so the gsub needs to find only cases of exactly \ then change those to a forward slash.
Maybe using this pattern (?<!\\)\\(?!\\)

Use NO codec with the file input. You will now have the original JSON string in the message field.
Use the mutate replace filter with the above pattern
Use the JSON filter to decode the message field

Thanks @guyboertje that helped a ton to process the problem lines.

It did have an unintended consequence of causing some that passed before to now fail. So I will investigate whether I can try use the JSON filter first and if it passes then all is well and move on. Otherwise (and I think I can check if it passed by the tag it adds when the JSON parse fails) do the string replace which you showed above. Not sure if I can have 2 JSON filters though. I will give it a crack tonight when I get some time.

Analyse what failed, maybe the regex pattern can be improved.
But failing that, yes, you can try one JSON filter first then have a conditional block that does the replace and second JSON filter.
The conditional will test tags having a _jsonparsefailureentry, e.g.

   ... JSON filter 1 ...
    if "_jsonparsefailure" in [tags] {
      ... replace single backslash ....
      ... JSON filter 2 ...
    }
2 Likes

@guyboertje thanks for your help. I have been able to get this working with the two filters. Thanks for the help.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.