Elasticsearch filter failing to start with logstash because of cluster not being available yet

Hi everyone!

Hope you're doing okay, i'm making this post because I couldn't find anything online about the following issue and I wanted to make sure i'm not overlooking something simple.

I have the stack installed in a series of vms on AWS (one VM for each installation) and I have an input pipeline that uses the Elasticsearch filter to enrich some logs with data from the cluster itself.

The problem is that when logstash first starts this pipeline gives the following error:


[2021-10-27T07:35:01,103][ERROR][logstash.javapipeline    ][<pipeline name>] Pipeline error {:pipeline_id=>"<pipeline name>", :exception=>#<Manticore::SocketException: Connection refused: connect>, :backtrace=>["fullpath/logstash/vendor/bundle/jruby/2.5.0/gems/manticore-0.7.0-java/lib/manticore/response.rb:37:in `block in initialize'", "fullpath/logstash/vendor/bundle/jruby/2.5.0/gems/manticore-0.7.0-java/lib/manticore/response.rb:79:in `call'", "fullpath/logstash/vendor/bundle/jruby/2.5.0/gems/manticore-0.7.0-java/lib/manticore/response.rb:274:in `call_once'", "fullpath/logstash/vendor/bundle/jruby/2.5.0/gems/manticore-0.7.0-java/lib/manticore/response.rb:158:in `code'", "fullpath/logstash/vendor/bundle/jruby/2.5.0/gems/elasticsearch-transport-5.0.5/lib/elasticsearch/transport/transport/http/manticore.rb:84:in `block in perform_request'", "fullpath/logstash/vendor/bundle/jruby/2.5.0/gems/elasticsearch-transport-5.0.5/lib/elasticsearch/transport/transport/base.rb:262:in `perform_request'", "fullpath/logstash/vendor/bundle/jruby/2.5.0/gems/elasticsearch-transport-5.0.5/lib/elasticsearch/transport/transport/http/manticore.rb:67:in `perform_request'", "fullpath/logstash/vendor/bundle/jruby/2.5.0/gems/elasticsearch-transport-5.0.5/lib/elasticsearch/transport/client.rb:131:in `perform_request'", "fullpath/logstash/vendor/bundle/jruby/2.5.0/gems/elasticsearch-api-5.0.5/lib/elasticsearch/api/actions/ping.rb:20:in `ping'", "fullpath/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-filter-elasticsearch-3.9.3/lib/logstash/filters/elasticsearch.rb:310:in `test_connection!'", "fullpath/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-filter-elasticsearch-3.9.3/lib/logstash/filters/elasticsearch.rb:117:in `register'", "org/logstash/config/ir/compiler/AbstractFilterDelegatorExt.java:75:in `register'", "fullpath/logstash/logstash-core/lib/logstash/java_pipeline.rb:228:in `block in register_plugins'", "org/jruby/RubyArray.java:1809:in `each'", "fullpath/logstash/logstash-core/lib/logstash/java_pipeline.rb:227:in `register_plugins'", "fullpath/logstash/logstash-core/lib/logstash/java_pipeline.rb:586:in `maybe_setup_out_plugins'", "fullpath/logstash/logstash-core/lib/logstash/java_pipeline.rb:240:in `start_workers'", "fullpath/logstash/logstash-core/lib/logstash/java_pipeline.rb:185:in `run'", "fullpath/logstash/logstash-core/lib/logstash/java_pipeline.rb:137:in `block in start'"], "pipeline.sources"=>["fullpath/pipelines/certa-estadisticas_logstash.conf"], :thread=>"#<Thread:0x174f411d run>"}
[2021-10-27T07:35:01,447][INFO ][logstash.javapipeline    ][<pipeline name>] Pipeline terminated {"pipeline.id"=>"<pipeline name>"}

As far as i have been able to debug this is because the ES nodes haven't started yet and the Elasticsearch filter seems to try to "resurrect" the connection only 2 or 3 times, leaving the pipeline off afterwards.

The rest of the pipelines on the installation have the schedule setting so this means logstash keeps running but without this particular pipeline.

Is there anyway to configure that specific pipeline to retry for longer (5 to 10 minutes)?

After all the outputs do retry the connection forever by default, I don't understand why this filter would be able to halt the pipeline entirelly until logstash is restarted with the cluster already running.

If it helps at all the cluster has 3 nodes, all of them running on separate VMs and they start up at the same time as logstash when the VM's get turned on.

I don't want to be that guy, but has no one ever experienced this before? I feel it's ridiculous to simply stop the pipeline alltogether because the cluster is unreachable at start.

The plugin has no problem retrying indefinetly if the cluster becomes unavailable during it's normal operation, why would it just stop the pipeline alltogether if it's not available during startup? I just feel there's a big inconsistency there.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.