Encoding issue depending on how logstash is started

Hello,

I have two windows maschines, one is a Windows Server and one is a Standard Windows Desktop PC, both have the same elasticsearch (8.4.3) and the same logstash (8.4.3) installed with the same configuration and the same pipelines, taking their data from the same Microsoft-SQL-Server.

I now try to query some NVARCHAR-Fields (which in MSSQL-Server is always encoded in UTF-16) via logstash-jdbc-input plugin without specifying any special encoding or charset settings, neither in the input nor in the output logstash plugins on both maschines.

On the windows server everything works perfectly and the documents are indexed to Elasticsearch as they should without errors. It doesn't make a difference whether logstash runs as a windows service via nssm or if I run it from the command line manually.
On the desktop pc on the other hand, I receive errors for every single document elasticsearch is trying to index stating that the encoding is incompatible. (again it's irrelevant if I run logstash manually from the commandline or as a windows service)

[2022-10-20T17:01:57,061][ERROR][logstash.outputs.elasticsearch][index_name][9648a8b8c103d11863b72d1b6d9624b2c3b8d672ae4baf73a17af87e6cc0c3e7] 
An unknown error occurred sending a bulk request to Elasticsearch (will retry indefinitely) {:message=>"incompatible encodings: CP850 and UTF-8", :exception=>Encoding::CompatibilityError, 
:backtrace=>[
    "org/jruby/ext/stringio/StringIO.java:1162:in `write'", 
    "C:/Program Files/ElasticSearch/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-output-elasticsearch-11.6.0-java/lib/logstash/outputs/elasticsearch/http_client.rb:142:in `block in bulk'", 
    "org/jruby/RubyArray.java:1865:in `each'", 
    "org/jruby/RubyEnumerable.java:1143:in `each_with_index'", 
    "C:/Program Files/ElasticSearch/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-output-elasticsearch-11.6.0-java/lib/logstash/outputs/elasticsearch/http_client.rb:125:in `bulk'", 
    "C:/Program Files/ElasticSearch/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-output-elasticsearch-11.6.0-java/lib/logstash/plugin_mixins/elasticsearch/common.rb:296:in `safe_bulk'", 
    "C:/Program Files/ElasticSearch/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-output-elasticsearch-11.6.0-java/lib/logstash/plugin_mixins/elasticsearch/common.rb:228:in `submit'", 
    "C:/Program Files/ElasticSearch/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-output-elasticsearch-11.6.0-java/lib/logstash/plugin_mixins/elasticsearch/common.rb:177:in `retrying_submit'", 
    "C:/Program Files/ElasticSearch/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-output-elasticsearch-11.6.0-java/lib/logstash/outputs/elasticsearch.rb:342:in `multi_receive'", 
    "org/logstash/config/ir/compiler/AbstractOutputDelegatorExt.java:121:in `multi_receive'", "C:/Program Files/ElasticSearch/logstash/logstash-core/lib/logstash/java_pipeline.rb:300:in `block in start_workers'"]}

But now it gets really interesting, if I now call the "logstash.bat" on the windows desktop PC, from the commandline manually AND redirect the standard output (and/or) standard error to a text file on the disk, it WORKS perfectly and indexes everything as it should without errors.

How is there a difference in the internal encoding based on how the batch-file is called and also why does it work on the server no matter what?

Has anyone maybe encountered something similar in the past?

Thanks in advance for your help.
Simon

You are not alone. There are threads here and here reporting this issue after upgrading to 8.4.2. Neither thread provided any resolution.

For output redirection to affect whether you get the exception is baffling.

1 Like

Thanks for pointing me at the two other threads.

If I could reliably reproduce the error, I would probably open up an issue on github, but I don't even really understand how it occurs exactly.

@simon137 can you put somewhere a raw log? 50-100 lines, not changed by any editor

I could, but it's really just exactly the same error that I put above over and over again only with a different time. There's nothing else. The only thing I changed are the line breaks.

As I said before, on other topic, I had to set encoding for SQL Server 2017 error log on Win 2012 . FB version around 7.10.

- module: mssql
  # Fileset for native deployment
  log:
    enabled: true
    var.paths: ["C:/Microsoft SQL Server/MSSQL14.SQLITS/MSSQL/Log/ERRORLOG*"]
    encoding: UTF-16

Without a sample log/data hardly to help.

Thank you :slight_smile: I'm sorry but I can't find the info in one of the two linked threads, but where did you set these settings? Is it in Filebeat? I think I'm not using that, I just query the server via the jdbc-input plugin for logstash like that:

input {
  jdbc {
    jdbc_driver_library => "C:\\ProgramData\\ElasticSearch\\logstash\\drivers\\mssql-jdbc-10.2.0.jre8.jar"
    jdbc_driver_class => "com.microsoft.sqlserver.jdbc.SQLServerDriver"
    jdbc_connection_string => "jdbc:sqlserver://server\instance;databasename=database;trustServerCertificate=true"
    jdbc_default_timezone => "Europe/Berlin"
    jdbc_user => "user"
    jdbc_password => "pw"
    schedule => "*/5 6-19 * * *"
    statement_filepath => "C:\\ProgramData\\ElasticSearch\\logstash\\pipelines\\index_name\\queries\\aktivitaeten.sql"
    clean_run => false
    use_column_value => true
    tracking_column => "editdate"
    tracking_column_type => "timestamp"
    last_run_metadata_path => "C:\\ProgramData\\ElasticSearch\\logstash\\pipelines\\index_name\\.logstash_jdbc_last_run"
  }
}

As for the logs, it's really not that I don't want to give them to you, but there really isn't anything more in them than this one error over and over again and before that the SQL-Query that the jdbc-plugin executes. That's it.

Exactly because the log was giving me nothing I went and monitored the network traffic on both systems to see how the data was being transferred to elasticsearch and you could really see how the server used UTF-8 (like what Elasticsearch expects: as an example it used two bytes "C3 A4" for an "ä", while the desktop pc uses one byte "E4" for an "ä") except like I said when I redirect the stdout :joy: It literally changes the way the data is sent to Elasticsearch and I don't get why.

Yes FB, just as a example. I had the same issue.

For your case, try:
columns_charset => { "column0" => "UTF-16" }

Thanks for the tip, but I tried that already and unfortunately this doesn't work either, no matther what encoding I use (UTF-16 would be correct, but I used ISO-8859-1, UTF-8 and several others in my test also).

I just checked with a few up and downgrades and it seems it all worked fine up until v8.3.3 and since v8.4.0 it doesn't anymore.
So I went ahead and created an issue on github for it: Logstash throws "Incompatible Encodings" error when querying NVARCHAR-Fields from MSSQL-Server · Issue #14679 · elastic/logstash · GitHub

1 Like

I'm pretty sure, this PullRequest might have been the breaking change here: https://github.com/elastic/logstash/pull/13523

2 Likes

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.