Logstash rabbitmq_input connection recovery fails

I am facing some issue that logstash does not want to reconnect to the rabbitMQ after losing the connection somehow.

I am using two logstash instances on two different VMs, these two logstash instances listen to the same RabbitMQ queue.

One logstash experienced an error and stopped the data ingestion. while the other one was not affected at all. Before, both logstashes had been running for around 12 hours.

Both machines are sharing the same configuration, located in the same domain, listened to the same queue, but they reacted differently.

I restarted the affected logstash service, and everything works perfect since then.

I found this error in the logs,

Error: #method<channel.close>(reply-code=406, reply-text=PRECONDITION_FAILED - unknown delivery tag 665, class-id=60, method-id=80) Exception: MarchHare::ChannelAlreadyClosed Stack: /usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/march_hare-4.1.1-java/lib/march_hare/exceptions.rb:121:in `convert_and_reraise'

After the error above in the logs, I found those retries, and they are updated every second

Error while setting up connection for rabbitmq input! Will retry. {:message=>"#method<channel.close>(reply-code=406, reply-text=PRECONDITION_FAILED - unknown delivery tag 665, class-id=60, method-id=80)", :class=>"MarchHare::ChannelAlreadyClosed", :location=>"/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/march_hare-4.1.1-java/lib/march_hare/exceptions.rb:121:in `convert_and_reraise'"}

It looks like the logstash tried to restore the connection every second but it did not succeed, until I restarted the logstash service.

I also found a similar error below in the logs around two hours before the above issue occured,

Error: MarchHare::ChannelAlreadyClosed Exception: MarchHare::ChannelAlreadyClosed Stack: /usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/march_hare-4.1.1-java/lib/march_hare/exceptions.rb:121:in `convert_and_reraise'

After this error I also noticed a few retries like these, but after a few retries the connection restored.

RabbitMQ connection was closed! {:url=>[redacted], :automatic_recovery=>true, :cause=>com.rabbitmq.client.ShutdownSignalException: connection error}

In a nutshell, the logstash met some connection issue with RabbitMQ, the first time it restored, the second time the recovery failed. The only difference between these two connection issue is 'reply-code=406, reply-text=PRECONDITION_FAILED - unknown delivery tag 665, class-id=60, method-id=80'.

I have two questions,

  1. What is the difference between these two connection errors? Explicitly, what does the 'reply-code=406' mean?
  2. Why did the restoration succeed the first time but failed the second?

The simple configuration that I am using looks like this,

input { 
    rabbitmq {
        host => [redacted]
        port => [redacted]
        vhost => 'xxx'
        passive => true
        queue => [redacted]
        ssl => true
        user =>  [redacted]
        password => [redacted]
        prefetch_count => 2
    }
}

It is an AMQP reply code that means "PRECONDITION_FAILED". What precondition? I think it is a precondition that a channel has to be open in order to call close on it (ChannelAlreadyClosed).

That pretty much has to be a bug. Not sure if it is in the rabbitmq input, or in the march_hare library that the input wraps.

Looking at this issue the comments suggest it is in the mixin, which is the wrapper around march hare than is used by both the input and output. But the code changes for this PR, which is related, were done in the output, not the mixin.

If it is 100% reproducible then I would open an issue against the input. The problem with resolving last time was that nobody at elastic could reproduce it.

Thanks for the information.

The logstash that we are using is 7.8.0, and it's running on Ubuntu 18.04.4

Almost a week after the restart, we are getting a similar issue again. All of sudden one of the logstash servers stopped the connection.

Within 20 minutes, we got three times the following messages,

[2020-09-10T07:27:07,393][ERROR][logstash.javapipeline    ][redacted][redacted] A plugin had an unrecoverable error. Will restart this plugin.
  Pipeline_id:redacted
  Plugin: <LogStash::Inputs::RabbitMQ vhost=>redacted, codec=><LogStash::Codecs::GzipLines id=>redacted, enable_metric=>true, charset=>"UTF-8">, password=><password>, port=>redacted, prefetch_count=>2, host=>[redacted], id=>redacted, passive=>true, ssl=>true, user=>redacted, queue=>redacted, enable_metric=>true, threads=>1, ssl_version=>"TLSv1.2", automatic_recovery=>true, connect_retry_interval=>1, durable=>false, auto_delete=>false, exclusive=>false, ack=>true, key=>"redacted", subscription_retry_interval_seconds=>5, metadata_enabled=>false>
  Error: MarchHare::ChannelAlreadyClosed
  Exception: MarchHare::ChannelAlreadyClosed
  Stack: /usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/march_hare-4.1.1-java/lib/march_hare/exceptions.rb:121:in `convert_and_reraise'
/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/march_hare-4.1.1-java/lib/march_hare/channel.rb:996:in `converting_rjc_exceptions_to_ruby'
/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/march_hare-4.1.1-java/lib/march_hare/channel.rb:762:in `basic_ack'
/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/march_hare-4.1.1-java/lib/march_hare/channel.rb:677:in `block in ack'
/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/march_hare-4.1.1-java/lib/march_hare/channel.rb:1011:in `guarding_against_stale_delivery_tags'
/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/march_hare-4.1.1-java/lib/march_hare/channel.rb:676:in `ack'
/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-integration-rabbitmq-7.0.3-java/lib/logstash/inputs/rabbitmq.rb:264:in `internal_queue_consume!'
/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-integration-rabbitmq-7.0.3-java/lib/logstash/inputs/rabbitmq.rb:232:in `consume!'
/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-integration-rabbitmq-7.0.3-java/lib/logstash/inputs/rabbitmq.rb:178:in `run'
/usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:345:in `inputworker'
/usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:336:in `block in start_input'

After the last error occured, we found 4 retries as this,

[2020-09-10T07:27:11,465][WARN ][logstash.inputs.rabbitmq ][redacted][redacted] Error while setting up connection for rabbitmq input! Will retry. {:message=>"MarchHare::ChannelAlreadyClosed", :class=>"MarchHare::ChannelAlreadyClosed", :location=>"/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/march_hare-4.1.1-java/lib/march_hare/exceptions.rb:121:in convert_and_reraise'"}`

No more retries came afterwards, left us with the error,

[2020-09-10T07:27:25,643][WARN ][com.rabbitmq.client.impl.ChannelManager][redacted][redacted] Consumer dispatcher for channel didn't shutdown after waiting for 10000 ms
[2020-09-10T07:27:30,336][WARN ][logstash.inputs.rabbitmq ][redacted][redacted] RabbitMQ connection was closed! {:url=>"redacted", :automatic_recovery=>true, :cause=>com.rabbitmq.client.ShutdownSignalException: connection error}

It did not autorecover this time as well. We manually restarted the logstash pipeline and it is back to work now.