I have a problem trying to parse a log file with backslashes in it. I have tried a number of escape options using the gsub() method although without any luck. I am not a ruby coder so its possible i am doing something simple but wrong.
[2017-02-06T11:21:45,716][ERROR][logstash.codecs.json ] JSON parse error, original data now in message field {:error=>#<LogStash::Json::ParserError: Unrecognized character escape 'P' (code 80)
; line: 1, column: 183]>, :data=>"{ \"time\": \"2017-02-02T20:12:22.583Z\", \"msg\": \"PP file system location: C:\\Pr\\pc\\au\\Pc0\\\" }\r"}
Hi @magnusbaeck I just checked the JSON but still can't spot a problem. I have edited it as the actual JSON has some paths and lots of other properties but the JSON above showed the issue. Anyway as I said I still can't see the problem with the JSON?
Okay so I now may be on the same wavelength as everyone that has replied so far, sorry it took so long.
I have found one of the problem lines of JSON in the intput file which looks like this
{ "time": "2017-02-02T20:12:22.583Z", "msg": "PCCIS file system location: C:\Prizm\pccis\auto_instances\Pccis0" }
So my previous thinking was this was going all the way through to the elastic output and then the error was being thrown. I am now thinking (I believe the same way everyone else is) that the problem is infact during the input process. When the line of JSON is converted into a JSON an error is thrown. The json validator shows this as non valid JSON as well.
So this brings me to my next question. I have no way of changing the logs as they come from an external source. Is there a way to replace single \ with double \ before the input tries to convert it to JSON? I assume this would fix the issue. Running a replace over the file would loose the benefit of logstash running in the background and bringing the logs into elastic in real time.
Now the document is going into elastic but obviously it would be awesome if there was a way to get the JSON that is in the 'message' property to the JSON object that goes into elastic? Is there a way to do this?
The problem with trying to use gsub is that you must add a second backslash only when there is a single backslash.
so the gsub needs to find only cases of exactly \ then change those to a forward slash.
Maybe using this pattern (?<!\\)\\(?!\\)
Use NO codec with the file input. You will now have the original JSON string in the message field.
Use the mutate replace filter with the above pattern
Use the JSON filter to decode the message field
Thanks @guyboertje that helped a ton to process the problem lines.
It did have an unintended consequence of causing some that passed before to now fail. So I will investigate whether I can try use the JSON filter first and if it passes then all is well and move on. Otherwise (and I think I can check if it passed by the tag it adds when the JSON parse fails) do the string replace which you showed above. Not sure if I can have 2 JSON filters though. I will give it a crack tonight when I get some time.
Analyse what failed, maybe the regex pattern can be improved.
But failing that, yes, you can try one JSON filter first then have a conditional block that does the replace and second JSON filter.
The conditional will test tags having a _jsonparsefailureentry, e.g.
... JSON filter 1 ...
if "_jsonparsefailure" in [tags] {
... replace single backslash ....
... JSON filter 2 ...
}
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.