Logstash+Kafka charset issue


(Daniel Whelan) #1

I'm sending JSON formatted logs from logstash -> HTTP endpoint -> Kafka -> logstash -> ElasticSearch. The logs appear to be formatted correctly until ingested by the second logstash process. I've checked with another kafka consumer and the data looks fine there. Any ideas where things are getting mangled and how I can avoid?

Log rewritten by logstash client: {"@timestamp":"2015-09-18T16:41:56.706Z","@source_host":"aws-mgmt-monitor-riemann-i-689515b8","@message":"Sep 18 16:41:56 dwhelan: test2","@fields":{"facility":"user","severity":"notice","program":"dwhelan","processid":"-","message":" test2"},"@version":"1","host":"aws-mgmt-monitor-riemann-i-689515b8.xyzxyz-mgmt.com","path":"/var/log/json","type":"syslog"}
Log written by logstash server: {"message":":sleeping:\u0002\u0000\u0000\u0000\u0017\u0000\u0000\u00019\xF1\b:)\n\u0001\xFA\x87received%\u0001'x)~%\xB2\xFB\u0017\u0000\xF4W\x89@timestampW2015-09-18T16:35:52.678Z\x8B@source_hostbaws-mgmt-monitor-riemann-i-689515b8\x87@message]Sep 18 O\u0000\xF30 dwhelan: test4\x86@fields\xFA\x87facilityCuser\x87severityEnotice\x86programF>\u0000”ąprocessid@-\x86j\u0000\u0012ER\u0000\xD0\xFB\x87@version@1\x83\xAF\u0000\u001Fr\xAF\u0000\u0010q.xyzxyz\xD6\u0000\xF0\u0015.com\x83pathL/var/log/json\x83typeEsyslog\xFB\u00160\xE9\xEA","@version":"1","@timestamp":"2015-09-18T16:35:53.659Z"}

Client config:
input {
file {
path => "/var/log/json*"
exclude => "*.gz"
type => "syslog"
codec => "json"
start_position => "beginning"
}
}
output {
http {
content_type => "application/json; charset=utf-8"
http_method => "post"
url => "http://rt-metrics.xyxyz.com/events/logs"
headers => ["Authorization", "Basic AUTHSTRINGGOESHERE"]
}
}

Server config:
input {
kafka {
consumer_threads => 1
topic_id => "logs"
zk_connect => "zk.xyzxyz.com:2181/logs"
codec => json {
charset => "UTF-8"
}
}
}


(system) #2