I have a large set of archived data to ingest into Elastic and some of the log entries have encoding issues that building up and causing the logstash ingest pipeline to stall. I have DLQ enabled but still getting indefinite retry errors that eventually kill the ingest pipeline. Is there a way to configure DLQ settings to handle these errors to prevent blocking of the ingest? Since this is a comparatively small number of log entries I want these messages dropped or logged in DLQ so they don't block the larger (300+GB) ingest.
[2022-11-13T14:19:08,872][ERROR][logstash.outputs.elasticsearch][main][5cb44decd2cbc2be7716f495d332a9e2a3a7489f1e19fa85d351fe7e9d0565c2] An unknown error occurred sending a bulk request to Elasticsearch (will retry indefinitely) {:message=>"incompatible encodings: IBM437 and UTF-8", :exception=>Encoding::CompatibilityError, :backtrace=>["org/jruby/ext/stringio/StringIO.java:1162:in `write'", "C:/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-output-elasticsearch-11.9.0-java/lib/logstash/outputs/elasticsearch/http_client.rb:142:in `block in bulk'", "org/jruby/RubyArray.java:1865:in `each'", "org/jruby/RubyEnumerable.java:1143:in `each_with_index'", "C:/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-output-elasticsearch-11.9.0-java/lib/logstash/outputs/elasticsearch/http_client.rb:125:in `bulk'", "C:/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-output-elasticsearch-11.9.0-java/lib/logstash/plugin_mixins/elasticsearch/common.rb:300:in `safe_bulk'", "C:/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-output-elasticsearch-11.9.0-java/lib/logstash/plugin_mixins/elasticsearch/common.rb:232:in `submit'", "C:/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-output-elasticsearch-11.9.0-java/lib/logstash/plugin_mixins/elasticsearch/common.rb:177:in `retrying_submit'", "C:/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-output-elasticsearch-11.9.0-java/lib/logstash/outputs/elasticsearch.rb:369:in `multi_receive'", "org/logstash/config/ir/compiler/AbstractOutputDelegatorExt.java:121:in `multi_receive'", "C:/logstash/logstash-core/lib/logstash/java_pipeline.rb:301:in `block in start_workers'"]}
So, this error seems to come from the encoding of the Bulk request being sent to Elasticsearch, so even prior being sent to Elasticsearch.
If the error was originated by Elasticsearch, we have a setting to customize the error codes to be considered for DLQ.
But this is coming straight from Logstash.
ruby {
init => "require 'json'"
code => "
begin
event.to_json
rescue => e
event.cancel # will drop the event, it will be lost
logger.info("bad message: ", "value" => event)
end
"
}
Excellent - thank you. I went through debug and identified the source of the issue - It looks like geoip is the source - from what I can tell the encoding issues seem to be coming from certain region_names that get introduced when their IP is looked up in the open maxmind database.
This data should be in json format as well as UTF-8 already right? Would the code snippet be any help here?
FYI - to get the data ingested I commented out all geoip lookup code - all going perfectly well without the enrichment activity - no more encoding errors.
Some data examples from the geoip output - I think this is generating encoding error -- one instance of each of the below was in 2 separate bulk ingest error debug logs - didn't see any other odd characters in the debug output. This causes the entire bulk request to error out. There are about 75 log lines in each bulk request.
It's weird to see those fields encoded like that..
Are you using the geoip filter?
Are you using a forced encoding in the codecs of the inputs?
What version of Logstash, JVM and OS you're using?
Is the Logstash process running with some env variable which might change the locale/encoding?
I would ask if you could post in a code snippet a minimal pipeline to reproduce eventually on my side and we'll probably open a public issue.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.