Force logstash to restart after "Pipeline aborted due to error"

Logstash v6.3.1 is logging this at startup and then getting stuck forever:

Sending Logstash's logs to /usr/share/logstash/logs which is now configured via log4j2.properties
[2021-04-30T13:05:39,258][ERROR][logstash.pipeline        ] Error registering plugin {:pipeline_id=>"main", :plugin=>"#<LogStash::OutputDelegator:0xde534a6>", :error=>"Connection refused - Failed to open TCP connection to 169.254.169.254:80 (Connection refused - connect(2) for \"169.254.169.254\" port 80)", :thread=>"#<Thread:0x7f00dfe4 run>"}
[2021-04-30T13:05:39,273][ERROR][logstash.pipeline        ] Pipeline aborted due to error {:pipeline_id=>"main", :exception=>#<Errno::ECONNREFUSED: Connection refused - Failed to open TCP connection to 169.254.169.254:80 (Connection refused - connect(2) for "169.254.169.254" port 80)>, :backtrace=>["org/jruby/ext/socket/RubyTCPSocket.java:134:in `initialize'", "org/jruby/RubyIO.java:1154:in `open'", "uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/net/http.rb:885:in `block in connect'", "org/jruby/ext/timeout/Timeout.java:149:in `timeout'", "uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/net/http.rb:883:in `connect'", "uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/net/http.rb:868:in `do_start'", "uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/net/http.rb:857:in `start'", "uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/net/http.rb:585:in `start'", "uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/net/http.rb:480:in `get_response'", "uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/net/http.rb:457:in `get'", "/app/logstash-output-cloudwatchlogs/lib/logstash/outputs/cloudwatchlogs.rb:116:in `register'", "org/logstash/config/ir/compiler/OutputStrategyExt.java:97:in `register'", "org/logstash/config/ir/compiler/OutputDelegatorExt.java:93:in `register'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:340:in `register_plugin'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:351:in `block in register_plugins'", "org/jruby/RubyArray.java:1734:in `each'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:351:in `register_plugins'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:728:in `maybe_setup_out_plugins'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:361:in `start_workers'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:288:in `run'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:248:in `block in start'"], :thread=>"#<Thread:0x7f00dfe4 run>"}
[2021-04-30T13:05:39,306][ERROR][logstash.agent           ] Failed to execute action {:id=>:main, :action_type=>LogStash::ConvergeResult::FailedAction, :message=>"Could not execute action: PipelineAction::Create<main>, action_result: false", :backtrace=>nil}

I know what causes this. There is a proxy service responsible for access to 169.254.169.254:80, and logstash sometimes starts up before the proxy service.

However, in the case that logstash encounters this error, it gets stuck here. It never restarts. How can I force logstash to restart when it encounters this error?

The retry behaviour depends on the plugin. What is the configuration for the plugin that is causing this?

The plugin is a probably thrice-forked and vendored copy of logstash-output-cloudwatchlogs.

There are four configuration blocks for it pointing at four different log groups, but they all look basically like this:

            cloudwatchlogs {
                log_group_name => "operational-logs"
                log_stream_name => "%instance_id%"
                region => "us-west-2"
            }

This plugin is long dead (even though tons of people still use it and there's no replacement), so if the issue is with the plugin, I'm probably on my own to fix it. I haven't done any plugin development in the past beyond hacking this thing enough to get it to work in the first place a couple years ago, but I'm going to guess that the issue is somewhere in its register method?

Indeed, it is failing here, when it does Net::HTTP.get(URI.parse("http://169.254.169.254/latest/meta-data/instance-id"). Google will find you documents about diagnosing problems with connectivity to the AWS metadata service.

When I spin up an AWS instance I have a /31 route to it.

169.254.169.254 0.0.0.0         255.255.255.255 UH        0 0          0 eth0

I know it does not help you, but the architecture of the plugin is not great. If you look at the tcp or http output they do very little in the register method. All the real initialization is spun off in a thread that retries if a connection is down. (Although the tcp one is not ideal, since it never consumes the shutdown event if it cannot connect.)

This issue provides some background on how plugins should be handling exceptions.

I was able to reproduce this issue by running logstash locally, and I tracked it down to a recent change we made to attempt to have logstash reload its configuration.

Using this simple pipeline config:

input {
    stdin {}
}

output {
    cloudwatchlogs {
        log_group_name => "hopcloud-operational-logs"
        log_stream_name => "%instance_id%"
        region => "us-west-2"
    }
}

I expect this to fail on my desktop because there's no EC2 metadata service on my desktop, obviously. And it does, with a very similar error (although it times out instead of getting a connection refused):

$ docker run -ti --rm -v "$(pwd)/pipeline.conf:/usr/share/logstash/pipeline/logstash.conf" our-repo/logstash:6.3.1-4 --log.level=warn
Sending Logstash's logs to /usr/share/logstash/logs which is now configured via log4j2.properties
[2021-05-03T21:17:56,400][ERROR][logstash.pipeline        ] Error registering plugin {:pipeline_id=>"main", :plugin=>"#<LogStash::OutputDelegator:0x5739af46>", :error=>"Failed to open TCP connection to 169.254.169.254:80 (execution expired)", :thread=>"#<Thread:0x52f6091a run>"}
[2021-05-03T21:17:56,408][ERROR][logstash.pipeline        ] Pipeline aborted due to error {:pipeline_id=>"main", :exception=>#<Net::OpenTimeout: Failed to open TCP connection to 169.254.169.254:80 (execution expired)>, :backtrace=>["org/jruby/ext/socket/RubyTCPSocket.java:112:in `initialize'", "org/jruby/RubyIO.java:1154:in `open'", "uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/net/http.rb:885:in `block in connect'", "org/jruby/ext/timeout/Timeout.java:149:in `timeout'", "uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/net/http.rb:883:in `connect'", "uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/net/http.rb:868:in `do_start'", "uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/net/http.rb:857:in `start'", "uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/net/http.rb:585:in `start'", "uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/net/http.rb:480:in `get_response'", "uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/net/http.rb:457:in `get'", "/app/logstash-output-cloudwatchlogs/lib/logstash/outputs/cloudwatchlogs.rb:116:in `register'", "org/logstash/config/ir/compiler/OutputStrategyExt.java:97:in `register'", "org/logstash/config/ir/compiler/OutputDelegatorExt.java:93:in `register'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:340:in `register_plugin'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:351:in `block in register_plugins'", "org/jruby/RubyArray.java:1734:in `each'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:351:in `register_plugins'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:728:in `maybe_setup_out_plugins'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:361:in `start_workers'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:288:in `run'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:248:in `block in start'"], :thread=>"#<Thread:0x52f6091a run>"}
[2021-05-03T21:17:56,425][ERROR][logstash.agent           ] Failed to execute action {:id=>:main, :action_type=>LogStash::ConvergeResult::FailedAction, :message=>"Could not execute action: PipelineAction::Create<main>, action_result: false", :backtrace=>nil}

But then, it dies! (With exit code 0, for some reason.) Which is what I want.

But in production, we recently added the flags --config.reload.automatic --config.reload.interval=30s. When I add those to my local copy, I get the same output:

$ docker run -ti --rm -v "$(pwd)/pipeline.conf:/usr/share/logstash/pipeline/logstash.conf" our-repo/logstash:6.3.1-4 --log.level warn --config.reload.automatic --config.reload.interval=30s
Sending Logstash's logs to /usr/share/logstash/logs which is now configured via log4j2.properties
[2021-05-03T21:35:37,956][ERROR][logstash.pipeline        ] Error registering plugin {:pipeline_id=>"main", :plugin=>"#<LogStash::OutputDelegator:0x1bfcd999>", :error=>"Failed to open TCP connection to 169.254.169.254:80 (execution expired)", :thread=>"#<Thread:0xd75a9be run>"}
[2021-05-03T21:35:37,965][ERROR][logstash.pipeline        ] Pipeline aborted due to error {:pipeline_id=>"main", :exception=>#<Net::OpenTimeout: Failed to open TCP connection to 169.254.169.254:80 (execution expired)>, :backtrace=>["org/jruby/ext/socket/RubyTCPSocket.java:112:in `initialize'", "org/jruby/RubyIO.java:1154:in `open'", "uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/net/http.rb:885:in `block in connect'", "org/jruby/ext/timeout/Timeout.java:149:in `timeout'", "uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/net/http.rb:883:in `connect'", "uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/net/http.rb:868:in `do_start'", "uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/net/http.rb:857:in `start'", "uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/net/http.rb:585:in `start'", "uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/net/http.rb:480:in `get_response'", "uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/net/http.rb:457:in `get'", "/app/logstash-output-cloudwatchlogs/lib/logstash/outputs/cloudwatchlogs.rb:116:in `register'", "org/logstash/config/ir/compiler/OutputStrategyExt.java:97:in `register'", "org/logstash/config/ir/compiler/OutputDelegatorExt.java:93:in `register'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:340:in `register_plugin'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:351:in `block in register_plugins'", "org/jruby/RubyArray.java:1734:in `each'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:351:in `register_plugins'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:728:in `maybe_setup_out_plugins'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:361:in `start_workers'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:288:in `run'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:248:in `block in start'"], :thread=>"#<Thread:0xd75a9be run>"}
[2021-05-03T21:35:37,988][ERROR][logstash.agent           ] Failed to execute action {:id=>:main, :action_type=>LogStash::ConvergeResult::FailedAction, :message=>"Could not execute action: PipelineAction::Create<main>, action_result: false", :backtrace=>nil}
^C[2021-05-03T21:42:56,591][WARN ][logstash.runner          ] SIGINT received. Shutting down.

But as you can see, it hangs until I kill it with Ctrl+C.

This likely means that this isn't a new problem, just one that we were able to correctly retry in the past. So I'll pull these flags back out and we'll go back to the old, adequate behavior.