Logstash from SQL Server to Elasticsearch character encoding problem

Startech · September 16, 2022, 3:23pm

Hi,

I am using ELK stack v8.4.1 and trying to integrate data between SQL Server and Elasticsearch via Logstash. My source table includes Turkish characters (collation SQL_Latin1_General_CP1_CI_AS). When Logstash writes these characters to Elasticsearch, it converts the Turkish characters to '?'. For example 'Şükrü' => '??kr?'. (I used before ELK stack v7.* and didn't have that problem)

This is my config file:

input {
    jdbc 
    {
        jdbc_connection_string => "jdbc:sqlserver://my-sql-connection-info;encrypt=false;characterEncoding=utf8"            
        jdbc_user => "my_sql_user"
        jdbc_driver_class => "com.microsoft.sqlserver.jdbc.SQLServerDriver"    
        jdbc_driver_library => "my_path\mssql-jdbc-11.2.0.jre11.jar"
        statement => [ "Select id,name,surname FROM ELK_Test" ]
        schedule => "*/30 * * * * *"    
    }
    stdin {
        codec => plain { charset => "UTF-8"}
   }
}    

output {
  elasticsearch {
    hosts => ["localhost:9200"]
    index => "test_index"
    document_id => "%{id}" 
    user => "logstash_user"
    password => "password"
  }
  stdout { codec => rubydebug }
}

I tried with and without filter to force encoding to UTF-8 but doesn't change.

filter {
        ruby { 
            code => 'event.set("name", event.get("name").force_encoding(::Encoding::UTF_8))'
        }
}

Below is my Elasticsearch result:

{
        "_index": "test_index",
        "_id": "2",
        "_score": 1,
        "_source": {
          "name": "??kr?",
          "@version": "1",
          "id": 2,
          "surname": "?e?meci",
          "@timestamp": "2022-09-16T13:02:00.254013300Z"
        }
      }

BTW console output results are correct.

{
          "name" => "Şükrü",
      "@version" => "1",
            "id" => 2,
       "surname" => "Çeşmeci",
    "@timestamp" => 2022-09-16T13:32:00.851877400Z
}

I tried to insert sample data from Kibana Dev Tool and the data was inserted without a problem. Does anybody help, please? What can be wrong? What can I check?

system · October 14, 2022, 3:23pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Encoding issue depending on how logstash is started Logstash elastic-stack-sql , windows	11	1350	November 18, 2022
Character encoding problems Logstash	11	30773	May 22, 2018
Scandinavian letters from PostgreSQL Logstash	1	448	September 11, 2018
8.4.2: "incompatible encodings: CP850 and UTF-8" - Issue sending DB records to Elastic after upgrade to Elastic 8.4.2 from 8.3.2 Logstash	5	937	December 6, 2022
Bad charset encoding in field names (II) Elasticsearch	6	741	June 20, 2018

Logstash from SQL Server to Elasticsearch character encoding problem

Related topics