MYSQL to Elasticsearch via Logstash Problem: incompatible encodings: CP850 and UTF-8

I am using an elk stack via docker-compose with ES version 8.4.0

My goal is to use Logstash to copy an entire table from my MYSQL DB to ES. The connection works and Logstash copies about 30 entries with no problems. But then I get a long error message:

[2022-09-10T18:41:26,318][ERROR][logstash.outputs.elasticsearch][main][757e3825fce0788f949869472d03e028630de9d063200717b56bc9ceefe29d81] An unknown error occurred sending a bulk request to Elasticsearch (will retry indefinitely) {:message=>"incompatible encodings: CP850 and UTF-8", :exception=>Encoding::CompatibilityError, :backtrace=>["org/jruby/ext/stringio/StringIO.java:1162:in write'", "D:/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-output-elasticsearch-11.6.0-java/lib/logstash/outputs/elasticsearch/http_client.rb:142:in block in bulk'", "org/jruby/RubyArray.java:1865:in each'", "org/jruby/RubyEnumerable.java:1143:in each_with_index'", "D:/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-output-elasticsearch-11.6.0-java/lib/logstash/outputs/elasticsearch/http_client.rb:125:in bulk'", "D:/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-output-elasticsearch-11.6.0-java/lib/logstash/plugin_mixins/elasticsearch/common.rb:296:in safe_bulk'", "D:/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-output-elasticsearch-11.6.0-java/lib/logstash/plugin_mixins/elasticsearch/common.rb:228:in submit'", "D:/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-output-elasticsearch-11.6.0-java/lib/logstash/plugin_mixins/elasticsearch/common.rb:177:in retrying_submit'", "D:/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-output-elasticsearch-11.6.0-java/lib/logstash/outputs/elasticsearch.rb:342:in multi_receive'", "org/logstash/config/ir/compiler/AbstractOutputDelegatorExt.java:121:in multi_receive'", "D:/logstash/logstash-core/lib/logstash/java_pipeline.rb:300:in `block in start_workers'"]}

I suspect this error is the reason: {:message=>"incompatible encodings: CP850 and UTF-8", :exception=>Encoding::CompatibilityError

My config file look like that:

 jdbc {
    clean_run => true
    jdbc_driver_library => "D:\logstash\mysql-connector-java-8.0.30.jar" 
    jdbc_driver_class => "com.mysql.jdbc.Driver"
    jdbc_connection_string => "jdbc:mysql://localhost:3306/posts" 
    jdbc_user => "sqluser"
    jdbc_password => "sqlpassword"
    schedule => "* * * * *" 
    statement => "SELECT id, id_post, url, id_subforum, author, text, spread, date, added 
    FROM telegram.channel_results where id >:sql_last_value;"
    use_column_value => true
    tracking_column => "id"
    
  }
}

output {
  elasticsearch {
    hosts => ["localhost:9200"]
    index => "posts"
    user => "username"
    password => "password"
  }

  stdout {
    codec => rubydebug
  }
}

I notice that if I removed the text column from the query, the process runs without any problems. In my database, the text column is of SQL type text. I suspect an encoding problem because there are also Russian texts and emotes included. I need a solution to also copy the texts in ES. Maybe it is a encoding problem with emotes and other characters in the text ?!

You could try forcing the encoding in a ruby filter

ruby { code => 'event.set("text", event.get("text").force_encoding(::Encoding::UTF_8))' }

Hello, thank you very much I will try it and give feedback

Unfortunately I still get this error

I added your code to my configuration file:

input {
jdbc {
    clean_run => true
    jdbc_driver_library => "D:\logstash\mysql-connector-java-8.0.30.jar" 
    jdbc_driver_class => "com.mysql.jdbc.Driver"
    jdbc_connection_string => "jdbc:mysql://localhost:3306/posts" 
    jdbc_user => "sqluser"
    jdbc_password => "sqlpassword"
    schedule => "* * * * *" 
    statement => "SELECT id, id_post, url, id_subforum, author, text, spread, date, added 
    FROM telegram.channel_results where id >:sql_last_value;"
    use_column_value => true
    tracking_column => "id"
  }   
}
filter{
ruby { 
  code => 'event.set("text", event.get("text").force_encoding(::Encoding::UTF_8))'
   }
}

output {
  elasticsearch {
    hosts => ["localhost:9200"]
    index => "posts"
    user => "username"
    password => "password"
  }

  stdout {
    codec => rubydebug
  }
}

is that the right position for the filter ?

Btw. the encoding of the Mysql source database is : utf8mb4_unicode_ci

Hello,
I have the same probleme but not with MYSQL in input but filebeat. I search many reponse but any work.
I'm new in ES community but I thinh your filter is at the right place
I hope someone can help us

Hello
We don't have response since longtime can someone from elastic can answer.
For temporary fixe the probleme i change é by e, à by a, ù by u etc... but can someone can find us a solution to convert the message in UTF-8
Please

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.