Hello community,
I have found other posts here to be helpful in solving my previous issues so I am hoping that someone can help me resolve this issue. I researched my problem in the community pages and have not found any recent solutions to my problem. I am running the ELK stack via 4 Docker containers. My versions are:
- elasticsearch:8.3.2
- kibana:8.3.2
- logstash:8.3.2
- metricbeat:8.3.3
The problem is I receive data from this pipeline, usa03000xxxxx, for a while (hours, days) then suddenly it will stop receiving data in this pipeline. When I look in the logs I see the following error but it does not tell me what data column is causing the problem:
[2024-02-12T17:49:23,088][ERROR][logstash.outputs.elasticsearch][usa03000xxxxx][usa03000xxxxx_jdbc_logstash] An unknown error occurred sending a bulk request to Elasticsearch (will retry indefinitely) {:message=>""\xD1" from ASCII-8BIT to UTF-8", :exception=>LogStash::Json::GeneratorError, :backtrace=>["/usr/share/logstash/logstash-core/lib/logstash/json.rb:43:in jruby_dump'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-output-elasticsearch-11.6.0-java/lib/logstash/outputs/elasticsearch/http_client.rb:127:in
block in bulk'", "org/jruby/RubyArray.java:2589:in map'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-output-elasticsearch-11.6.0-java/lib/logstash/outputs/elasticsearch/http_client.rb:127:in
block in bulk'", "org/jruby/RubyArray.java:1821:in each'", "org/jruby/RubyEnumerable.java:1258:in
each_with_index'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-output-elasticsearch-11.6.0-java/lib/logstash/outputs/elasticsearch/http_client.rb:125:in bulk'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-output-elasticsearch-11.6.0-java/lib/logstash/plugin_mixins/elasticsearch/common.rb:296:in
safe_bulk'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-output-elasticsearch-11.6.0-java/lib/logstash/plugin_mixins/elasticsearch/common.rb:228:in submit'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-output-elasticsearch-11.6.0-java/lib/logstash/plugin_mixins/elasticsearch/common.rb:177:in
retrying_submit'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-output-elasticsearch-11.6.0-java/lib/logstash/outputs/elasticsearch.rb:342:in multi_receive'", "org/logstash/config/ir/compiler/OutputStrategyExt.java:143:in
multi_receive'", "org/logstash/config/ir/compiler/AbstractOutputDelegatorExt.java:121:in multi_receive'", "/usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:300:in
block in start_workers'"]}
My logstash config file:
input {
jdbc {
jdbc_driver_library => "/usr/share/logstash/logstash-core/lib/jars/mssql-jdbc-10.2.1.jre11.jar"
jdbc_driver_class => "com.microsoft.sqlserver.jdbc.SQLServerDriver"
jdbc_connection_string => "jdbc:sqlserver://usa03000xxxxx.net:45688;databaseName=AuditDbwork;trustServerCertificate=true;encrypt=false"
jdbc_user => "${msuser}"
jdbc_password => "${mspw}"
connection_retry_attempts => 5
schedule => "*/10 * * * *"
statement => "SELECT * FROM radarX_Log WHERE 'starttime' >= '2022-07-30' and 'starttime' > :sql_last_value ORDER BY starttime ASC"
columns_charset => { "transactionid" => "ISO-8859-1" }
columns_charset => { "binarydata" => "ISO-8859-1" }
last_run_metadata_path => "/usr/share/logstash/data/last_run/usa03000xxxxx"
record_last_run => true
tracking_column => "starttime"
tracking_column_type => "timestamp"
use_column_value => true
}
}
filter {
mutate {
add_field => {
"log_source" => "usa03000xxxxx.net"
"DB_Type" => "MSSQL"
"Database_Name" => "AuditDBwork"
}
}
grok {
match => { "starttime" => "(?<date>%{YEAR}[\/\-\s]%{MONTHNUM}[\/\-\s]%{MONTHDAY})" }
}
fingerprint {
source => ["textdata", "dbusername", "eventclass", "ownername", "date", "databasename", "ntusername", "objectname"]
target => "[@metadata][fingerprint]"
method => "SHA1"
key => "7y47ST9qg5F8ezy7"
base64encode => true
concatenate_sources => true
}
}
output {
elasticsearch {
hosts => "https://elk030001.net:9200"
id => "usa03000xxxxx_jdbc_logstash"
data_stream => "true"
data_stream_type => "logs"
data_stream_dataset => "jdbc"
data_stream_namespace => "usa03000xxxxx"
document_id => "%{[@metadata][fingerprint]}"
user => "${logstash_user}"
password => "${logstash_pw}"
ssl => true
cacert => "${cacert}"
}
}
I found a few posts that look similar to my problem but they appear to be related to older versions of the ELK stack.
- My column_charset is already lowercase
- I think my logstash-plugin is already up to date
- There was already a logstash-plugin fix pushed to the codec
Is there somewhere in the ELK stack where I can see the data types of the columns selected from this pipeline? So far I have only found the column names and their values but not the data types.
Is this problem usually caused by a date/datetime data type?
Is it one of the columns in my filter.fingerprint.source that is the problem?
Will adding characterEncoding=utf8 to my connection string cover everything? Will that cause problems with other data?