Json Parser failure when the message contains emoji unicodes

Hi there.

Logstash log a _jasonparsefailure when the message contains some emojis..

{
    "@timestamp" => 2019-04-03T14:22:08.147Z,
      "@version" => "1",
       "message" => "{\\\"event\\\":\\\"WhatsAppMessagesLog\\\",\\\"body\\\":{\\\"protocol\\\":\\\"155430118273\\\",\\\"route_number\\\":\\\"554197697084\\\",\\\"created_at\\\":\\\"2019-04-03 11:21:26\\\",\\\"from\\\":\\\"554197697084\\\",\\\"message_type\\\":\\\"text\\\",\\\"to\\\":\\\"554188220551\\\",\\\"event\\\":\\\"WEON_SEND\\\",\\\"type\\\":\\\"send\\\",\\\"uniqueid\\\":\\\"201904031554301286584.40320\\\",\\\"content\\\":\\\"\\xED\\xA0\\xBE\\xED\\xB4\\xB1\\xED\\xA0\\xBC\\xED\\xBF\\xBB  \\xED\\xA0\\xBD\\xED\\xB1\\xAE\\xED\\xA0\\xBC\\xED\\xBF\\xBB‍♀️  \\xED\\xA0\\xBD\\xED\\xB1\\xAE\\xED\\xA0\\xBC\\xED\\xBF\\xBB‍♀️  \\xED\\xA0\\xBD\\xED\\xB1\\xAE\\xED\\xA0\\xBC\\xED\\xBF\\xBB‍♀️\\\"}}",
          "tags" => [
        [0] "_jsonparsefailure"
    ]
}

My logstash conf:
input {
        stomp {
                id => "idPrimeiro"
                host => "10.158.0.4"
                destination => "WEL"
		codec => "json"
        }
}
output {
        elasticsearch {
                hosts => ["localhost:9200"]
                index => "qualifications"
        }
        stdout {
                codec => rubydebug
        }
	file {
                path => "/var/log/logstash/stomp.log"
                codec => rubydebug
        }


}

I've tried several different codecs and charsets on the input chain, no success at all.

The messages come from ActiveMQ that correctly post the message on the logstash queue.

Having the wrong charset could be the issue. There should be an "exception=>#<LogStash::Json::ParserError:" with a more specific error message in the logstash log. What is the error message?

[2019-04-03T15:56:12,505][WARN ][logstash.codecs.json ] Received an event that has a different character encoding than you configured. {:text=&gt;"{\\\"event\\\":\\\"WhatsAppMessagesLog\\\",\\\"body\\\":{\\\"protocol\\\":\\\"155429559695\\\",\\\"route_number\\\":\\\"554197780247\\\",\\\"created_at\\\":\\\"2019-04-03 15:56:12\\\",\\\"from\\\":\\\"554192078513\\\",\\\"message_type\\\":\\\"text\\\",\\\"to\\\":\\\"554197780247\\\",\\\"event\\\":\\\"INBOX\\\",\\\"type\\\":\\\"receipt\\\",\\\"uniqueid\\\":\\\"201904031554317772385.43090\\\",\\\"content\\\":\\\"\\xED\\xA0\\xBD\\xED\\xB8\\x80\\xED\\xA0\\xBD\\xED\\xB8\\x80\\xED\\xA0\\xBD\\xED\\xB8\\x80\\xED\\xA0\\xBD\\xED\\xB8\\x80\\\"}}", :expected_charset=&gt;"UTF-8"}

[2019-04-03T15:56:12,631][ERROR][logstash.codecs.json ] JSON parse error, original data now in message field {:error=&gt;#&lt;LogStash::Json::ParserError: Unexpected character ('\' (code 92)): was expecting double-quote to start field name

 at [Source: (String)"{\"event\":\"WhatsAppMessagesLog\",\"body\":{\"protocol\":\"155429559695\",\"route_number\":\"554197780247\",\"created_at\":\"2019-04-03 15:56:12\",\"from\":\"554192078513\",\"message_type\":\"text\",\"to\":\"554197780247\",\"event\":\"INBOX\",\"type\":\"receipt\",\"uniqueid\":\"201904031554317772385.43090\",\"content\":\"\xED\xA0\xBD\xED\xB8\x80\xED\xA0\xBD\xED\xB8\x80\xED\xA0\xBD\xED\xB8\x80\xED\xA0\xBD\xED\xB8\x80\"}}"; line: 1, column: 3]&gt;, :data=&gt;"{\\\"event\\\":\\\"WhatsAppMessagesLog\\\",\\\"body\\\":{\\\"protocol\\\":\\\"155429559695\\\",\\\"route_number\\\":\\\"554197780247\\\",\\\"created_at\\\":\\\"2019-04-03 15:56:12\\\",\\\"from\\\":\\\"554192078513\\\",\\\"message_type\\\":\\\"text\\\",\\\"to\\\":\\\"554197780247\\\",\\\"event\\\":\\\"INBOX\\\",\\\"type\\\":\\\"receipt\\\",\\\"uniqueid\\\":\\\"201904031554317772385.43090\\\",\\\"content\\\":\\\"\\xED\\xA0\\xBD\\xED\\xB8\\x80\\xED\\xA0\\xBD\\xED\\xB8\\x80\\xED\\xA0\\xBD\\xED\\xB8\\x80\\xED\\xA0\\xBD\\xED\\xB8\\x80\\\"}}"}

Code 92 is a backslash. So it seems that instead of something like

{ "foo": "bar" }

you have

{ \"foo\": \"bar\" }

Can you edit your post, select the text of the message and the sample data and click on </> in the toolbar above the edit panel. That will blockquote the text and preserve the escaping etc.

Thanks Badger, done.

The input is clearly not UTF-8. Try charset ASCII-8BIT, see if that helps.

You JSON is not valid JSON, because all the quotes are escaped. This is a rather blunt approach, but you could try

mutate { gsub => [ "message", '\\"', '"' ] }

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.