Filter Error ASCII-8BIT to UTF-8

Hi, I got the following error when running the logstash with jdbc plugin to fetch logs from a oracle database.

Mar 13 17:09:31 exabeamlms01 docker[10496]: [2018-03-13T17:09:31,010][ERROR]
[logstash.filters.ruby ] Ruby exception occurred: ""\xA6"" from ASCII-8BIT to UTF-8
Mar 13 17:09:35 exabeamlms01 docker[10496]: [2018-03-13T17:09:35,997][ERROR][logstash.filters.ruby ] Ruby exception occurred: ""\xCE"" from ASCII-8BIT to UTF-8
Mar 13 17:09:39 exabeamlms01 docker[10496]: [2018-03-13T17:09:39,606][ERROR][logstash.filters.ruby ] Ruby exception occurred: ""\xB4"" from ASCII-8BIT to UTF-8
Mar 13 17:09:39 exabeamlms01 docker[10496]: [2018-03-13T17:09:39,884][ERROR][logstash.filters.ruby ] Ruby exception occurred: ""\xB7"" from ASCII-8BIT to UTF-8

Here's my config for the jdbc input:
input { jdbc { codec => line jdbc_driver_library =>"/opt/jdbc-drivers/ojdbc8.jar" jdbc_driver_class => "Java::oracle.jdbc.driver.OracleDriver" jdbc_connection_string => "jdbc:oracle:thin:@(DESCRIPTION=(CONNECT_TIMEOUT=1)(TRANSPORT_CONNECT_TIMEOUT=1)(RETRY_COUNT=1)(ADDRESS_LIST=(LOAD_BALANCE=off)(FAILOVER=on)(ADDRESS=(HOST=wittet)(PORT=1521)(PROTOCOL=TCP))(ADDRESS=(HOST=exa03-scan.wittet)(PORT=1521)(PROTOCOL=TCP)))(CONNECT_DATA=(SERVICE_NAME=sedbsrv.db)(FAILOVER_MODE=(TYPE=SELECT)(METHOD=BASIC))))" jdbc_user => "XXXX" jdbc_password => "XXX" schedule => "* * * * *" statement => "SELECT * from unified_audit_trail WHERE Event_timestamp > :sql_last_value" use_column_value => false tracking_column => ORIGINATING_TIMESTAMP last_run_metadata_path => "/opt/exabeam/co`nfig/.last_run" } }

This is my filter and output config:
filter {
ruby {
init => 'require "json"'
code => '
message = {}
keys = event.to_hash.keys
keys.each {|key|
if key != "@timestamp" then
message[key] = event.get(key)
event.remove(key)
end
}
event.set("message", message.to_json)
'
}
}
output {
kafka {
topic_id => "kafka.topic"
bootstrap_servers => "KAFKA_HOST:9092"
codec => json_lines
}
}

I tried the following settings but none of these solved the issue:
codec => plain { charset => "ASCII-8BIT" }
codec => plain { charset => "UTF-8" }
codec => line

I'm running Logstash 5.4, not sure if this issue will exist in 6.1 tho.

Did anyone see this issue before?

The columns_charset option of logstash-input-jdbc allows you to specify the charset of individual columns:

input {
  jdbc {
    columns_charset => {
      "some_field" => "BINARY"
      "another_field" => "UTF-8"
    }
    # ...
  }
}
1 Like

Possibly unrelated, but the same error occurs with the XML filter when used with persisted disk queuing. Are you using persisted disk in your logstash.yml config? There's a known issue out there on GitHub for this as well.

I tried the charset for codec but that does not work for me. I figure out the fix myself.
In the ruby filter I have to force encode all strings using:

if value.is_a? String then
    value = value.force_encoding("ISO-8859-15").encode("UTF-8")
end
1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.