DLQ settings to handle bulk ingest encoding error

tharris1 · November 13, 2022, 6:04pm

I have a large set of archived data to ingest into Elastic and some of the log entries have encoding issues that building up and causing the logstash ingest pipeline to stall. I have DLQ enabled but still getting indefinite retry errors that eventually kill the ingest pipeline. Is there a way to configure DLQ settings to handle these errors to prevent blocking of the ingest? Since this is a comparatively small number of log entries I want these messages dropped or logged in DLQ so they don't block the larger (300+GB) ingest.

[2022-11-13T14:19:08,872][ERROR][logstash.outputs.elasticsearch][main][5cb44decd2cbc2be7716f495d332a9e2a3a7489f1e19fa85d351fe7e9d0565c2] An unknown error occurred sending a bulk request to Elasticsearch (will retry indefinitely) {:message=>"incompatible encodings: IBM437 and UTF-8", :exception=>Encoding::CompatibilityError, :backtrace=>["org/jruby/ext/stringio/StringIO.java:1162:in `write'", "C:/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-output-elasticsearch-11.9.0-java/lib/logstash/outputs/elasticsearch/http_client.rb:142:in `block in bulk'", "org/jruby/RubyArray.java:1865:in `each'", "org/jruby/RubyEnumerable.java:1143:in `each_with_index'", "C:/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-output-elasticsearch-11.9.0-java/lib/logstash/outputs/elasticsearch/http_client.rb:125:in `bulk'", "C:/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-output-elasticsearch-11.9.0-java/lib/logstash/plugin_mixins/elasticsearch/common.rb:300:in `safe_bulk'", "C:/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-output-elasticsearch-11.9.0-java/lib/logstash/plugin_mixins/elasticsearch/common.rb:232:in `submit'", "C:/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-output-elasticsearch-11.9.0-java/lib/logstash/plugin_mixins/elasticsearch/common.rb:177:in `retrying_submit'", "C:/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-output-elasticsearch-11.9.0-java/lib/logstash/outputs/elasticsearch.rb:369:in `multi_receive'", "org/logstash/config/ir/compiler/AbstractOutputDelegatorExt.java:121:in `multi_receive'", "C:/logstash/logstash-core/lib/logstash/java_pipeline.rb:301:in `block in start_workers'"]}

Luca_Belluccini · November 13, 2022, 8:58pm

So, this error seems to come from the encoding of the Bulk request being sent to Elasticsearch, so even prior being sent to Elasticsearch.
If the error was originated by Elasticsearch, we have a setting to customize the error codes to be considered for DLQ.
But this is coming straight from Logstash.

It seems this falls in the same issue as Logstash throws "Incompatible Encodings" error when querying NVARCHAR-Fields from MSSQL-Server · Issue #14679 · elastic/logstash · GitHub, but my suggestion might be to use a Ruby code snippet to try to encode the message in JSON and if there's an exception, log it and drop it.
I've not tested it but it might be something similar to this (it will be slightly more expensive due to the JSON marshalling just to verify it can be encoded):

ruby {
    init => "require 'json'"
    code => "
begin
  event.to_json
rescue => e
  event.cancel # will drop the event, it will be lost
  logger.info("bad message: ", "value" => event)
end
"
}

tharris1 · November 13, 2022, 9:10pm

Excellent - thank you. I went through debug and identified the source of the issue - It looks like geoip is the source - from what I can tell the encoding issues seem to be coming from certain region_names that get introduced when their IP is looked up in the open maxmind database.

This data should be in json format as well as UTF-8 already right? Would the code snippet be any help here?

FYI - to get the data ingested I commented out all geoip lookup code - all going perfectly well without the enrichment activity - no more encoding errors.

Some data examples from the geoip output - I think this is generating encoding error -- one instance of each of the below was in 2 separate bulk ingest error debug logs - didn't see any other odd characters in the debug output. This causes the entire bulk request to error out. There are about 75 log lines in each bulk request.

debug log 1contained - "region_name"=>"┼îsaka"
debug log 2 contained - "city_name"=>"Qu├⌐bec"

Luca_Belluccini · November 15, 2022, 3:43pm

It's weird to see those fields encoded like that..
Are you using the geoip filter?
Are you using a forced encoding in the codecs of the inputs?
What version of Logstash, JVM and OS you're using?
Is the Logstash process running with some env variable which might change the locale/encoding?

I would ask if you could post in a code snippet a minimal pipeline to reproduce eventually on my side and we'll probably open a public issue.

system · December 13, 2022, 3:44pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Getting An unknown error occurred sending a bulk request to Elasticsearch Logstash	2	353	June 11, 2023
Unknown error occurred sending a bulk request to Elasticsearch Logstash docker	2	528	March 21, 2024
[logstash.outputs.elasticsearch] An unknown error occurred sending a bulk request to Elasticsearch Logstash	2	2954	September 18, 2018
[ERROR][logstash.outputs.elasticsearch] An unknown error occurred sending a bulk request to Elasticsearch Logstash	2	2493	October 18, 2017
DLQ 7.1.0 not working Logstash	1	324	January 16, 2020

DLQ settings to handle bulk ingest encoding error

Related topics