Unknown error occurred sending a bulk request to Elasticsearch

Hello community,

I have found other posts here to be helpful in solving my previous issues so I am hoping that someone can help me resolve this issue. I researched my problem in the community pages and have not found any recent solutions to my problem. I am running the ELK stack via 4 Docker containers. My versions are:

  • elasticsearch:8.3.2
  • kibana:8.3.2
  • logstash:8.3.2
  • metricbeat:8.3.3

The problem is I receive data from this pipeline, usa03000xxxxx, for a while (hours, days) then suddenly it will stop receiving data in this pipeline. When I look in the logs I see the following error but it does not tell me what data column is causing the problem:

[2024-02-12T17:49:23,088][ERROR][logstash.outputs.elasticsearch][usa03000xxxxx][usa03000xxxxx_jdbc_logstash] An unknown error occurred sending a bulk request to Elasticsearch (will retry indefinitely) {:message=>""\xD1" from ASCII-8BIT to UTF-8", :exception=>LogStash::Json::GeneratorError, :backtrace=>["/usr/share/logstash/logstash-core/lib/logstash/json.rb:43:in jruby_dump'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-output-elasticsearch-11.6.0-java/lib/logstash/outputs/elasticsearch/http_client.rb:127:in block in bulk'", "org/jruby/RubyArray.java:2589:in map'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-output-elasticsearch-11.6.0-java/lib/logstash/outputs/elasticsearch/http_client.rb:127:in block in bulk'", "org/jruby/RubyArray.java:1821:in each'", "org/jruby/RubyEnumerable.java:1258:in each_with_index'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-output-elasticsearch-11.6.0-java/lib/logstash/outputs/elasticsearch/http_client.rb:125:in bulk'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-output-elasticsearch-11.6.0-java/lib/logstash/plugin_mixins/elasticsearch/common.rb:296:in safe_bulk'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-output-elasticsearch-11.6.0-java/lib/logstash/plugin_mixins/elasticsearch/common.rb:228:in submit'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-output-elasticsearch-11.6.0-java/lib/logstash/plugin_mixins/elasticsearch/common.rb:177:in retrying_submit'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-output-elasticsearch-11.6.0-java/lib/logstash/outputs/elasticsearch.rb:342:in multi_receive'", "org/logstash/config/ir/compiler/OutputStrategyExt.java:143:in multi_receive'", "org/logstash/config/ir/compiler/AbstractOutputDelegatorExt.java:121:in multi_receive'", "/usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:300:in block in start_workers'"]}

My logstash config file:

input {
  jdbc {
    jdbc_driver_library => "/usr/share/logstash/logstash-core/lib/jars/mssql-jdbc-10.2.1.jre11.jar"
    jdbc_driver_class => "com.microsoft.sqlserver.jdbc.SQLServerDriver"
    jdbc_connection_string => "jdbc:sqlserver://usa03000xxxxx.net:45688;databaseName=AuditDbwork;trustServerCertificate=true;encrypt=false"
    jdbc_user => "${msuser}"
    jdbc_password => "${mspw}"
    connection_retry_attempts => 5
    schedule => "*/10 * * * *"
    statement => "SELECT * FROM radarX_Log WHERE 'starttime' >= '2022-07-30' and 'starttime' > :sql_last_value ORDER BY starttime ASC"
    columns_charset => { "transactionid" => "ISO-8859-1" }
    columns_charset => { "binarydata" => "ISO-8859-1" }
    last_run_metadata_path => "/usr/share/logstash/data/last_run/usa03000xxxxx"
    record_last_run => true
    tracking_column => "starttime"
    tracking_column_type => "timestamp"
    use_column_value => true
       }
}

filter {
  mutate {
    add_field => {
      "log_source" => "usa03000xxxxx.net"
          "DB_Type" => "MSSQL"
      "Database_Name" => "AuditDBwork"
       }
  }
  grok {
    match => { "starttime" => "(?<date>%{YEAR}[\/\-\s]%{MONTHNUM}[\/\-\s]%{MONTHDAY})" }
       }
  fingerprint {
    source => ["textdata", "dbusername", "eventclass", "ownername", "date", "databasename", "ntusername", "objectname"]
    target => "[@metadata][fingerprint]"
    method => "SHA1"
    key => "7y47ST9qg5F8ezy7"
    base64encode => true
    concatenate_sources => true
  }
}

output {
 elasticsearch {
   hosts => "https://elk030001.net:9200"
   id => "usa03000xxxxx_jdbc_logstash"
   data_stream => "true"
   data_stream_type => "logs"
   data_stream_dataset => "jdbc"
   data_stream_namespace => "usa03000xxxxx"
   document_id => "%{[@metadata][fingerprint]}"
   user => "${logstash_user}"
   password => "${logstash_pw}"
   ssl => true
   cacert => "${cacert}"
 }
}

I found a few posts that look similar to my problem but they appear to be related to older versions of the ELK stack.

Is there somewhere in the ELK stack where I can see the data types of the columns selected from this pipeline? So far I have only found the column names and their values but not the data types.

Is this problem usually caused by a date/datetime data type?

Is it one of the columns in my filter.fingerprint.source that is the problem?

Will adding characterEncoding=utf8 to my connection string cover everything? Will that cause problems with other data?

That's blowing up here. Per this comment on the same error in a different context it means you have non-UTF-8 characters in one of your fields.

OK, if in a ruby filter you do

            text = [0xD1].pack("C*")
            text.to_json

you will get

"\xD1" from ASCII-8BIT to UTF-8 {:class=>"Encoding::UndefinedConversionError"

You have a couple of options. If you know the encoding of the string you have you
could try this ruby code

text = event.get("problemField") 
text = text.force_encoding("iso-8859-1").encode("utf-8")
event.set("problemField", text)

which will get you

"problemField" => "Ñ",

I am not telling you that you have iso-8859-1 encoded text. It's plausible but you need to determine if this is true.

If the encoding is unknown or varies then a more aggressive approach would be

event.set("problemField", text.encode("UTF-8", "binary", :invalid => :replace, :undef => :replace)

which will get you

"problemField" => "�",

I recognize that losing data is bad, but your fields have to be valid UTF-8 to be sent to elasticsearch. It is not optional.

I assume you know which fields are likely to have non-UTF-8 data in them. If you do not know then you will have to iterate over the fields of the event. This code should give you some ideas.

I hope this helps you to understand the issue.

Another thing you could do in ruby is

            initialEvent = event.to_json # String.to_json crashes, Hash.to_json does not
            fixedEvent   = initialEvent.encode("UTF-8", "binary", "replace" => "x", :invalid => :replace, :undef => :replace)
            if fixedEvent != initialEvent
                event.tag("encodingProblem")
            end

then route to a different output based on "encodingProblem" in [tags], if you use a rubydebug output then start looking for \x, as in

"problemField" => "\xD1",
        "tags" => [
    [0] "encodingProblem"
],

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.