Encoding issue depending on how logstash is started

simon137 · October 20, 2022, 3:35pm

Hello,

I have two windows maschines, one is a Windows Server and one is a Standard Windows Desktop PC, both have the same elasticsearch (8.4.3) and the same logstash (8.4.3) installed with the same configuration and the same pipelines, taking their data from the same Microsoft-SQL-Server.

I now try to query some NVARCHAR-Fields (which in MSSQL-Server is always encoded in UTF-16) via logstash-jdbc-input plugin without specifying any special encoding or charset settings, neither in the input nor in the output logstash plugins on both maschines.

On the windows server everything works perfectly and the documents are indexed to Elasticsearch as they should without errors. It doesn't make a difference whether logstash runs as a windows service via nssm or if I run it from the command line manually.
On the desktop pc on the other hand, I receive errors for every single document elasticsearch is trying to index stating that the encoding is incompatible. (again it's irrelevant if I run logstash manually from the commandline or as a windows service)

[2022-10-20T17:01:57,061][ERROR][logstash.outputs.elasticsearch][index_name][9648a8b8c103d11863b72d1b6d9624b2c3b8d672ae4baf73a17af87e6cc0c3e7] 
An unknown error occurred sending a bulk request to Elasticsearch (will retry indefinitely) {:message=>"incompatible encodings: CP850 and UTF-8", :exception=>Encoding::CompatibilityError, 
:backtrace=>[
    "org/jruby/ext/stringio/StringIO.java:1162:in `write'", 
    "C:/Program Files/ElasticSearch/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-output-elasticsearch-11.6.0-java/lib/logstash/outputs/elasticsearch/http_client.rb:142:in `block in bulk'", 
    "org/jruby/RubyArray.java:1865:in `each'", 
    "org/jruby/RubyEnumerable.java:1143:in `each_with_index'", 
    "C:/Program Files/ElasticSearch/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-output-elasticsearch-11.6.0-java/lib/logstash/outputs/elasticsearch/http_client.rb:125:in `bulk'", 
    "C:/Program Files/ElasticSearch/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-output-elasticsearch-11.6.0-java/lib/logstash/plugin_mixins/elasticsearch/common.rb:296:in `safe_bulk'", 
    "C:/Program Files/ElasticSearch/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-output-elasticsearch-11.6.0-java/lib/logstash/plugin_mixins/elasticsearch/common.rb:228:in `submit'", 
    "C:/Program Files/ElasticSearch/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-output-elasticsearch-11.6.0-java/lib/logstash/plugin_mixins/elasticsearch/common.rb:177:in `retrying_submit'", 
    "C:/Program Files/ElasticSearch/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-output-elasticsearch-11.6.0-java/lib/logstash/outputs/elasticsearch.rb:342:in `multi_receive'", 
    "org/logstash/config/ir/compiler/AbstractOutputDelegatorExt.java:121:in `multi_receive'", "C:/Program Files/ElasticSearch/logstash/logstash-core/lib/logstash/java_pipeline.rb:300:in `block in start_workers'"]}

But now it gets really interesting, if I now call the "logstash.bat" on the windows desktop PC, from the commandline manually AND redirect the standard output (and/or) standard error to a text file on the disk, it WORKS perfectly and indexes everything as it should without errors.

How is there a difference in the internal encoding based on how the batch-file is called and also why does it work on the server no matter what?

Has anyone maybe encountered something similar in the past?

Thanks in advance for your help.
Simon

Badger · October 20, 2022, 4:20pm

You are not alone. There are threads here and here reporting this issue after upgrading to 8.4.2. Neither thread provided any resolution.

For output redirection to affect whether you get the exception is baffling.

simon137 · October 20, 2022, 6:07pm

Thanks for pointing me at the two other threads.

If I could reliably reproduce the error, I would probably open up an issue on github, but I don't even really understand how it occurs exactly.

Rios · October 20, 2022, 7:01pm

@simon137 can you put somewhere a raw log? 50-100 lines, not changed by any editor

simon137 · October 20, 2022, 7:23pm

I could, but it's really just exactly the same error that I put above over and over again only with a different time. There's nothing else. The only thing I changed are the line breaks.

Rios · October 20, 2022, 7:30pm

As I said before, on other topic, I had to set encoding for SQL Server 2017 error log on Win 2012 . FB version around 7.10.

- module: mssql
  # Fileset for native deployment
  log:
    enabled: true
    var.paths: ["C:/Microsoft SQL Server/MSSQL14.SQLITS/MSSQL/Log/ERRORLOG*"]
    encoding: UTF-16

Without a sample log/data hardly to help.

simon137 · October 20, 2022, 8:19pm

Thank you I'm sorry but I can't find the info in one of the two linked threads, but where did you set these settings? Is it in Filebeat? I think I'm not using that, I just query the server via the jdbc-input plugin for logstash like that:

input {
  jdbc {
    jdbc_driver_library => "C:\\ProgramData\\ElasticSearch\\logstash\\drivers\\mssql-jdbc-10.2.0.jre8.jar"
    jdbc_driver_class => "com.microsoft.sqlserver.jdbc.SQLServerDriver"
    jdbc_connection_string => "jdbc:sqlserver://server\instance;databasename=database;trustServerCertificate=true"
    jdbc_default_timezone => "Europe/Berlin"
    jdbc_user => "user"
    jdbc_password => "pw"
    schedule => "*/5 6-19 * * *"
    statement_filepath => "C:\\ProgramData\\ElasticSearch\\logstash\\pipelines\\index_name\\queries\\aktivitaeten.sql"
    clean_run => false
    use_column_value => true
    tracking_column => "editdate"
    tracking_column_type => "timestamp"
    last_run_metadata_path => "C:\\ProgramData\\ElasticSearch\\logstash\\pipelines\\index_name\\.logstash_jdbc_last_run"
  }
}

As for the logs, it's really not that I don't want to give them to you, but there really isn't anything more in them than this one error over and over again and before that the SQL-Query that the jdbc-plugin executes. That's it.

Exactly because the log was giving me nothing I went and monitored the network traffic on both systems to see how the data was being transferred to elasticsearch and you could really see how the server used UTF-8 (like what Elasticsearch expects: as an example it used two bytes "C3 A4" for an "ä", while the desktop pc uses one byte "E4" for an "ä") except like I said when I redirect the stdout It literally changes the way the data is sent to Elasticsearch and I don't get why.

Rios · October 20, 2022, 8:50pm

Yes FB, just as a example. I had the same issue.

For your case, try:
columns_charset => { "column0" => "UTF-16" }

simon137 · October 21, 2022, 6:23am

Thanks for the tip, but I tried that already and unfortunately this doesn't work either, no matther what encoding I use (UTF-16 would be correct, but I used ISO-8859-1, UTF-8 and several others in my test also).

simon137 · October 21, 2022, 8:09am

I just checked with a few up and downgrades and it seems it all worked fine up until v8.3.3 and since v8.4.0 it doesn't anymore.
So I went ahead and created an issue on github for it: Logstash throws "Incompatible Encodings" error when querying NVARCHAR-Fields from MSSQL-Server · Issue #14679 · elastic/logstash · GitHub

simon137 · October 21, 2022, 9:32am

I'm pretty sure, this PullRequest might have been the breaking change here: https://github.com/elastic/logstash/pull/13523

system · November 18, 2022, 9:32am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Incompatible encodings: IBM437 and UTF-8 Logstash	12	1581	December 8, 2022
8.4.2: "incompatible encodings: CP850 and UTF-8" - Issue sending DB records to Elastic after upgrade to Elastic 8.4.2 from 8.3.2 Logstash	5	1001	December 6, 2022
"Incompatible encoding" when using Logstash to ship JSON files to Elasticsearch Elasticsearch	6	1016	July 6, 2017
Logstash escaping characters, want to disable Logstash	3	635	January 10, 2023
Jdbc input, no filters, csv output => CompatibilityError: incompatible encodings: UTF-8 and ASCII-8BIT Logstash	2	870	December 27, 2016

Encoding issue depending on how logstash is started

Related topics