Charset not considered on HTTP input plugin


Before opening a ticket I would like to discuss my issue here.

I receive a event.message string encoded in UTF-8 even though I specify codec => plain { charset => "ASCII-8BIT"} in my config.

Here my config:

input {
http {
host => ""
port => 8080
codec => plain { charset => "ASCII-8BIT"}
filter {
example {
output {
stdout { codec => rubydebug }

Here an extract my custom filter:

def filter(event)

@logger.debug? && @logger.debug("The event.message size is: #{event.get("message").size()}")
@logger.debug? && @logger.debug("The event.message encoding is: #{event.get("message").encoding}")
counter = 0;
event.get("message").each_byte { |c| 

#Increments the counter for each byte within the string
counter +=1
@logger.debug? && @logger.debug("There are #{counter} bytes in the string")

# filter_matched should go in the last line of our successful code

end # def filter

And here the output (I expect The event.message encoding is: ASCII-8BIT)

filter received {:event=>{"message"=>"H\u0000\u0002\u0001\a\u0000\u0000\u0000\u0001\u0002\u0003\u0004\u0005C&\v\u0000\u0000\u0000\u0000\u0000\u0000�F\u0000\u0000"\v\u0000", "@version"=>"1", "@timestamp"=>"2016-10-11T23:32:52.277Z", "host"=>"", "headers"=>{"request_method"=>"POST", "request_path"=>"/", "request_uri"=>"/", "http_version"=>"HTTP/1.1", "http_user_agent"=>"Mozilla/4.0 (compatible; AP:FiOS-Mercury/; PL:Motorola-DCT/KA15.76.12.19AlderF.560; BX:VMS1100; UA:0000108336906021; U; en-US)", "http_host"=>"", "http_accept"=>"/", "content_type"=>"application/x-www-form-urlencoded", "content_length"=>"2880"}}, :level=>:debug, :file=>"(eval)", :line=>"41", :method=>"filter_func"}
The event.message size is: 2880 {:level=>:debug, :file=>"logstash/filters/example.rb", :line=>"18", :method=>"filter"}
The event.message encoding is: UTF-8 {:level=>:debug, :file=>"logstash/filters/example.rb", :line=>"19", :method=>"filter"}
There are 2898 bytes in the string {:level=>:debug, :file=>"logstash/filters/example.rb", :line=>"27", :method=>"filter"}

I just figured out this is actually a feature implemented on LogStash::Util::Charset

def convert(data)

NON UTF-8 charset declared.

Let's convert it (as cleanly as possible) into UTF-8 so we can use it with JSON, etc.

return data.encode(Encoding::UTF_8, :invalid => :replace, :undef => :replace) unless @charset_encoding == Encoding::UTF_8

I might be missing something, but it would be great if we could specify some kind of 'keep_original_charset', this would allow handling arbitrary binary protocols at filter level.

Just in case someone hit a similar problem, I solved the issue by coding a custom codec and

Inside there you have the original encoding and you can do pretty much whatever you want (parse it, create an string-array of hex's...)

Here an extract:

def decode(data)

array_data = data.unpack('C*')

header_char = array_data.shift(1).pack('C*')                          #1 My first byte
header_version_number = getnumber_frombytes(array_data.shift(1))      #2 My second byte
header_platform_id_number = getnumber_frombytes(array_data.shift(1))  #3 My third byte
header_isextended_number = getnumber_frombytes(array_data.shift(1))   #4 My fourth byte