Logstash SQS output failing makes pipeline stop working entirely

I have a logstash pipeline that uses an SQS output. I'm seeing errors where logstash will just stop working if it cannot connect to SQS. The logs seem to indicate that logstash attempts to restart, fails and then just does nothing.

Is there a way to make it try again until it can connect to SQS? It's only happening intermittently and every time the issue is resolved by manually restarting logstash.

Truncated logs below (too big for a post) and full logs here

[2020-05-06T18:48:35,441][FATAL][logstash.runner          ] An unexpected error occurred! {:error=>#<Seahorse::Client::NetworkingError: Failed to open TCP connection to sqs.us-west-2.amazonaws.com:443
[2020-05-06T18:48:38,132][ERROR][org.logstash.Logstash    ] java.lang.IllegalStateException: Logstash stopped processing because of an error: (SystemExit) exit
[2020-05-06T18:49:20,643][INFO ][logstash.runner          ] Starting Logstash {"logstash.version"=>"6.4.0"}
[2020-05-06T18:49:47,386][INFO ][logstash.pipeline        ] Starting pipeline {:pipeline_id=>"main", "pipeline.workers"=>20, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>50}
[2020-05-06T18:50:01,463][ERROR][logstash.pipeline        ] Error registering plugin {:pipeline_id=>"main", :plugin=>"#<LogStash::OutputDelegator:0x615710d4>", :error=>"Failed to open TCP connection t
[2020-05-06T18:50:01,475][ERROR][logstash.pipeline        ] Pipeline aborted due to error {:pipeline_id=>"main", :exception=>#<Seahorse::Client::NetworkingError: Failed to open TCP connection to sqs.u
[2020-05-06T18:50:01,512][ERROR][logstash.agent           ] Failed to execute action {:id=>:main, :action_type=>LogStash::ConvergeResult::FailedAction, :message=>"Could not execute action: PipelineAct
[2020-05-06T18:50:01,992][INFO ][logstash.agent           ] Successfully started Logstash API endpoint {:port=>9600}

The plugin makes one attempt to connect to SQS during initialization. If that fails it does not retry and will not connect, so nothing happens.

Is there anyway to make it try more than once? (other than editing the ruby of the plugin itself obviously)

I'm seeing issues where the machines seem to have a temporary network issue, then just stop working all together. I feel like I've seen different configurations that have an ES output handle this sort of issue gracefully, but I don't have any logs on hand to back up that feeling.

Yes, an elasticsearch has code to retry the connection. The sqs input does not.

hmm. Ok thanks. Good to know I wasn't just missing the option somewhere.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.