It's become a daily occurrence in the last week where we see a drop in logs in Elasticsearch caused by blocked pipelines in our logstash stack.
We utilize a cluster of logstash for accepting and pushing logs into an SQS queue, and another cluster reading from the queue, filtering, and pushing to elasticsearch. This particular error occurs on the first cluster, and is only resolved by restarting the logstash process (container).
{:timestamp=>"2016-06-14T13:04:34.272000+0000", :message=>"Failed to flush outgoing items", :outgoing_count=>5, :exception=>"AWS::Errors::Base", :backtrace=>["/opt/logstash/vendor/bundle/jruby/1.9/gems/aws-sdk-v1-1.66.0/lib/aws/core/client.rb:375:in `return_or_raise'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/aws-sdk-v1-1.66.0/lib/aws/core/client.rb:476:in `client_request'", "(eval):3:in `send_message_batch'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/aws-sdk-v1-1.66.0/lib/aws/sqs/queue.rb:551:in `batch_send'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-output-sqs-2.0.4/lib/logstash/outputs/sqs.rb:129:in `flush'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/stud-0.0.22/lib/stud/buffer.rb:219:in `buffer_flush'", "org/jruby/RubyHash.java:1342:in `each'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/stud-0.0.22/lib/stud/buffer.rb:216:in `buffer_flush'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/stud-0.0.22/lib/stud/buffer.rb:193:in `buffer_flush'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/stud-0.0.22/lib/stud/buffer.rb:159:in `buffer_receive'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-output-sqs-2.0.4/lib/logstash/outputs/sqs.rb:121:in `receive'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-core-2.3.1-java/lib/logstash/outputs/base.rb:83:in `multi_receive'", "org/jruby/RubyArray.java:1613:in `each'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-core-2.3.1-java/lib/logstash/outputs/base.rb:83:in `multi_receive'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-core-2.3.1-java/lib/logstash/output_delegator.rb:130:in `worker_multi_receive'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-core-2.3.1-java/lib/logstash/output_delegator.rb:129:in `worker_multi_receive'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-core-2.3.1-java/lib/logstash/output_delegator.rb:114:in `multi_receive'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-core-2.3.1-java/lib/logstash/pipeline.rb:301:in `output_batch'", "org/jruby/RubyHash.java:1342:in `each'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-core-2.3.1-java/lib/logstash/pipeline.rb:301:in `output_batch'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-core-2.3.1-java/lib/logstash/pipeline.rb:232:in `worker_loop'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-core-2.3.1-java/lib/logstash/pipeline.rb:201:in `start_workers'"], :level=>:warn}
{:timestamp=>"2016-06-14T13:04:53.142000+0000", :message=>"Lumberjack input: the pipeline is blocked, temporary refusing new connection.", :level=>:warn}
{:timestamp=>"2016-06-14T13:04:53.534000+0000", :message=>"Beats input: the pipeline is blocked, temporary refusing new connection.", :reconnect_backoff_sleep=>0.5, :level=>:warn}
the sqs output conf:
output {
sqs {
batch_events => 5
queue => "${SQS_OUTPUT_QUEUE}"
region => "${AWS_REGION}"
}
}
the beats input conf:
input {
beats {
port => 5044
ssl => true
ssl_certificate => "/etc/pki/tls/certs/logstash-forwarder/lumberjack.crt"
ssl_key => "/etc/pki/tls/private/logstash-forwarder/lumberjack.key"
}
}
These nodes do not perform any filtering, just input -> queue.
The Dockerfile:
FROM logstash:2.3.1
ENV SERVICE_NAME=logstash
CMD ["--allow-env", "-f", "/opt/config"]
COPY ./config/shipper /opt/config
Unfortunately, the error is not very helpful. I am assuming it is BatchRequestTooLong
, but that is just a guess. For now, I will disable batch sending.