Hi! I'm having a problem with how Logstash with JDBC input escapes characters.
When I run the SQL query in SSMS I get the result fine: "\publicerat\IN0022.pdf"
The JDBC input looks like this:
input {
jdbc {
jdbc_connection_string => "jdbc:sqlserver://lifescience;"
jdbc_user => "xxx"
jdbc_password => "xxx"
jdbc_driver_library => "./logstash-core/lib/jars/mssql-jdbc-9.2.1.jre8.jar"
jdbc_driver_class => "com.microsoft.sqlserver.jdbc.SQLServerDriver"
statement => "select id, path from table"
tracking_column => "mrec_datetime"
tracking_column_type => "timestamp"
use_column_value => true
last_run_metadata_path => "./logs/.logstash_jdbc_last_run"
schedule => "*/30 * * * * *"
}
}
But when I query Elasticsearch with
GET /myindex/_search
{
"from" : 0, "size" : 1000,
"query" : {
"match_all": {}
}
}
It has been indexed as "path": "\"\\publicerat\\IN0022.pdf\"",
How can I disable the escape that is happening here?
One thought I had was that there might be different types of the field, which is was ("path" was just a text without any analyzer or anything).
But I recreated the index and reindexed the documents with the same settings as the index created by FSCrawler has, but it's still encoded every " as \" and every \ as \\
"path": {
"type": "keyword",
"fields": {
"fulltext": {
"type": "text"
},
"tree": {
"type": "text",
"analyzer": "fscrawler_path",
"fielddata": true
}
}
},
"virtual": {
"type": "keyword",
"fields": {
"fulltext": {
"type": "text"
},
"tree": {
"type": "text",
"analyzer": "fscrawler_path",
"fielddata": true
}
}
}
*virtual is from FSCrawler and "path" is from Logstash/JDBC/SQL Server
Just to follow up: This issue we're facing is related to this issue in Logstash:
opened 08:47AM - 21 Oct 22 UTC
bug
status:needs-triage
**Logstash information**:
1. Logstash version: 8.4.3 (i checked the issue came … in from v8.3.3 to v8.4.0)
2. Logstash installation source: expanded from tar or zip archive
3. How is Logstash being run: via commandline or as a windows service
**Plugins installed**: no extra plugins were installed
**JVM** (e.g. `java -version`): Bundled JDK:
openjdk 17.0.4 2022-07-19
OpenJDK Runtime Environment Temurin-17.0.4+8 (build 17.0.4+8)
OpenJDK 64-Bit Server VM Temurin-17.0.4+8 (build 17.0.4+8, mixed mode, sharing)
-> but also tested with:
openjdk 11.0.15 2022-04-19
OpenJDK Runtime Environment Temurin-11.0.15+10 (build 11.0.15+10)
OpenJDK 64-Bit Server VM Temurin-11.0.15+10 (build 11.0.15+10, mixed mode)
**OS version**: Windows 10
**Description of the problem including expected versus actual behavior**:
If I query some some NVARCHAR-Fields from a Microsoft-SQL-Server (which in MSSQL-Server are always encoded in UTF-16) via logstash-jdbc-input plugin without specifying any special encoding or charset settings, neither in the input nor in the output logstash plugins, logstash failes to transfer the events to ElasticSearch by throwing this error over and over again for every document:
````
[2022-10-20T17:01:57,061][ERROR][logstash.outputs.elasticsearch][index_name][9648a8b8c103d11863b72d1b6d9624b2c3b8d672ae4baf73a17af87e6cc0c3e7]
An unknown error occurred sending a bulk request to Elasticsearch (will retry indefinitely) {:message=>"incompatible encodings: CP850 and UTF-8", :exception=>Encoding::CompatibilityError,
:backtrace=>[
"org/jruby/ext/stringio/StringIO.java:1162:in `write'",
"C:/Program Files/ElasticSearch/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-output-elasticsearch-11.6.0-java/lib/logstash/outputs/elasticsearch/http_client.rb:142:in `block in bulk'",
"org/jruby/RubyArray.java:1865:in `each'",
"org/jruby/RubyEnumerable.java:1143:in `each_with_index'",
"C:/Program Files/ElasticSearch/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-output-elasticsearch-11.6.0-java/lib/logstash/outputs/elasticsearch/http_client.rb:125:in `bulk'",
"C:/Program Files/ElasticSearch/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-output-elasticsearch-11.6.0-java/lib/logstash/plugin_mixins/elasticsearch/common.rb:296:in `safe_bulk'",
"C:/Program Files/ElasticSearch/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-output-elasticsearch-11.6.0-java/lib/logstash/plugin_mixins/elasticsearch/common.rb:228:in `submit'",
"C:/Program Files/ElasticSearch/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-output-elasticsearch-11.6.0-java/lib/logstash/plugin_mixins/elasticsearch/common.rb:177:in `retrying_submit'",
"C:/Program Files/ElasticSearch/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-output-elasticsearch-11.6.0-java/lib/logstash/outputs/elasticsearch.rb:342:in `multi_receive'",
"org/logstash/config/ir/compiler/AbstractOutputDelegatorExt.java:121:in `multi_receive'", "C:/Program Files/ElasticSearch/logstash/logstash-core/lib/logstash/java_pipeline.rb:300:in `block in start_workers'"]}
````
This worked fine up until Version 8.3.3 of Logstash, since Version 8.4.0 it doesn't work anymore.
I also tried specifying the Encoding as UTF-16 with jdbc-input-plugins columns_charset option, but this doesn't affect the behaviour of logstash at all.
**Steps to reproduce**:
1. Create a pipeline with the following input.conf (you have to change the Connection String to any Microsoft SQL-Server of course):
````
input {
jdbc {
jdbc_driver_library => "C:\\ProgramData\\ElasticSearch\\logstash\\drivers\\mssql-jdbc-10.2.0.jre8.jar"
jdbc_driver_class => "com.microsoft.sqlserver.jdbc.SQLServerDriver"
jdbc_connection_string => "jdbc:sqlserver://server\instance;databasename=database;trustServerCertificate=true"
jdbc_default_timezone => "Europe/Berlin"
jdbc_user => "user"
jdbc_password => "pw"
schedule => "*/5 6-19 * * *"
statement_filepath => "C:\\ProgramData\\ElasticSearch\\logstash\\pipelines\\index_name\\queries\\sqlQuery.sql"
clean_run => false
use_column_value => true
tracking_column => "editdate"
tracking_column_type => "timestamp"
last_run_metadata_path => "C:\\ProgramData\\ElasticSearch\\logstash\\pipelines\\index_name\\.logstash_jdbc_last_run"
}
}
````
2. In the "sqlQuery.sql" just write any SQL-Query that queries at least one NVARCHAR-Field from any table.
3. Create the following output.conf for the pipeline (with different user/pw of course):
````
output {
elasticsearch {
hosts => [ "http://localhost:9200" ]
index => "index_name"
document_id => "%{document_id}"
action => "update"
doc_as_upsert => true
data_stream => "false"
user => "elastic"
password => "pw"
}
}
````
4. Run logstash either manually from the commandline or as a windows service and you'll get the above error.
(**Oddly enough** if you run the logstash.bat from the commandline and redirect stdout (and/or) stderr to a file, it works perfectly without any errors and indexes everything as it should. I have no idea how it is possible though, that output redirection affects the behaviour of logstash here, to be honest it just makes no sense.)
Downgraded to 8.3.3 and reindexing everything solved it. Issue still exists in version 8.5.2
system
(system)
Closed
January 10, 2023, 7:51am
4
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.