Sqs output errors cause blocked input pipelines

NoumanSaleem · June 14, 2016, 1:17pm

It's become a daily occurrence in the last week where we see a drop in logs in Elasticsearch caused by blocked pipelines in our logstash stack.

We utilize a cluster of logstash for accepting and pushing logs into an SQS queue, and another cluster reading from the queue, filtering, and pushing to elasticsearch. This particular error occurs on the first cluster, and is only resolved by restarting the logstash process (container).

{:timestamp=>"2016-06-14T13:04:34.272000+0000", :message=>"Failed to flush outgoing items", :outgoing_count=>5, :exception=>"AWS::Errors::Base", :backtrace=>["/opt/logstash/vendor/bundle/jruby/1.9/gems/aws-sdk-v1-1.66.0/lib/aws/core/client.rb:375:in `return_or_raise'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/aws-sdk-v1-1.66.0/lib/aws/core/client.rb:476:in `client_request'", "(eval):3:in `send_message_batch'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/aws-sdk-v1-1.66.0/lib/aws/sqs/queue.rb:551:in `batch_send'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-output-sqs-2.0.4/lib/logstash/outputs/sqs.rb:129:in `flush'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/stud-0.0.22/lib/stud/buffer.rb:219:in `buffer_flush'", "org/jruby/RubyHash.java:1342:in `each'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/stud-0.0.22/lib/stud/buffer.rb:216:in `buffer_flush'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/stud-0.0.22/lib/stud/buffer.rb:193:in `buffer_flush'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/stud-0.0.22/lib/stud/buffer.rb:159:in `buffer_receive'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-output-sqs-2.0.4/lib/logstash/outputs/sqs.rb:121:in `receive'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-core-2.3.1-java/lib/logstash/outputs/base.rb:83:in `multi_receive'", "org/jruby/RubyArray.java:1613:in `each'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-core-2.3.1-java/lib/logstash/outputs/base.rb:83:in `multi_receive'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-core-2.3.1-java/lib/logstash/output_delegator.rb:130:in `worker_multi_receive'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-core-2.3.1-java/lib/logstash/output_delegator.rb:129:in `worker_multi_receive'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-core-2.3.1-java/lib/logstash/output_delegator.rb:114:in `multi_receive'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-core-2.3.1-java/lib/logstash/pipeline.rb:301:in `output_batch'", "org/jruby/RubyHash.java:1342:in `each'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-core-2.3.1-java/lib/logstash/pipeline.rb:301:in `output_batch'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-core-2.3.1-java/lib/logstash/pipeline.rb:232:in `worker_loop'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-core-2.3.1-java/lib/logstash/pipeline.rb:201:in `start_workers'"], :level=>:warn}

{:timestamp=>"2016-06-14T13:04:53.142000+0000", :message=>"Lumberjack input: the pipeline is blocked, temporary refusing new connection.", :level=>:warn}

{:timestamp=>"2016-06-14T13:04:53.534000+0000", :message=>"Beats input: the pipeline is blocked, temporary refusing new connection.", :reconnect_backoff_sleep=>0.5, :level=>:warn}

the sqs output conf:

output {
  sqs {
    batch_events => 5
    queue => "${SQS_OUTPUT_QUEUE}"
    region => "${AWS_REGION}"
  }
}

the beats input conf:

input {
  beats {
    port => 5044
    ssl => true
    ssl_certificate => "/etc/pki/tls/certs/logstash-forwarder/lumberjack.crt"
    ssl_key => "/etc/pki/tls/private/logstash-forwarder/lumberjack.key"
  }
}

These nodes do not perform any filtering, just input -> queue.

The Dockerfile:

FROM logstash:2.3.1

ENV SERVICE_NAME=logstash
CMD ["--allow-env", "-f", "/opt/config"]

COPY ./config/shipper /opt/config

Unfortunately, the error is not very helpful. I am assuming it is BatchRequestTooLong, but that is just a guess. For now, I will disable batch sending.

magnusbaeck · June 19, 2016, 10:17am

There's nothing in the ES logs?

NoumanSaleem · June 20, 2016, 1:30pm

@magnusbaeck our setup resembles the last diagram on the Deploying Scaling Logstash guide https://www.elastic.co/guide/en/logstash/current/deploying-and-scaling.html

The issue occurs in the first set of Logstash instances responsible for outputing logs to the queue (SQS), and because of that we do not receive any logs in ES.

Topic		Replies	Views
Blocked outputs Logstash	3	985	April 14, 2017
Logstash SQS output failing makes pipeline stop working entirely Logstash	5	503	June 4, 2020
SQS Input Pipeline Blocking Pipeline Management Updates and Terminate Job Logstash	2	573	July 4, 2019
SQS input plugin fails with some messages Logstash	1	1207	July 6, 2017
Logstash to elasticsearch output error Logstash	2	971	December 16, 2016

Sqs output errors cause blocked input pipelines

Related topics