Logstash crashes when ES isn't available

Hello,

I turned off my elasticsearch to test what happens when it fails (or isn't running). I want to handle this kind of behaviour, sending an alarm to zabbix.

The weird thing happened. When I started my logstash, I got errors saying my 'ELASTIC SEARCH' isn't running, which it's true. But when a message arrived and logstash processed my output plugin:

    output {
        if ![tags] {
            stdout { }
            elasticsearch {
                hosts           => "${ES_HOSTS}"
                doc_as_upsert   => true
                document_id     => "%{[@metadata][fingerprint]}"
                index           => "idx"
            }    
        }
    }

It printed out the message's content and then:

[2018-05-17T16:44:18,004][FATAL][logstash.runner          ] An unexpected error occurred! {:error=>#<NoMethodError: undefined method `<' for nil:NilClass>, :backtrace=>["/Users/casmeiron/Developer/tools/logstash-6.2.4/vendor/bundle/jruby/2.3.0/gems/logstash-output-elasticsearch-9.1.1-java/lib/logstash/outputs/elasticsearch/common.rb:222:in `get_event_type'", "/Users/casmeiron/Developer/tools/logstash-6.2.4/vendor/bundle/jruby/2.3.0/gems/logstash-output-elasticsearch-9.1.1-java/lib/logstash/outputs/elasticsearch/common.rb:47:in `event_action_tuple'", "/Users/casmeiron/Developer/tools/logstash-6.2.4/vendor/bundle/jruby/2.3.0/gems/logstash-output-elasticsearch-9.1.1-java/lib/logstash/outputs/elasticsearch/common.rb:36:in `block in multi_receive'", "org/jruby/RubyArray.java:2486:in `map'", "/Users/casmeiron/Developer/tools/logstash-6.2.4/vendor/bundle/jruby/2.3.0/gems/logstash-output-elasticsearch-9.1.1-java/lib/logstash/outputs/elasticsearch/common.rb:36:in `multi_receive'", "/Users/casmeiron/Developer/tools/logstash-6.2.4/logstash-core/lib/logstash/output_delegator_strategies/shared.rb:13:in `multi_receive'", "/Users/casmeiron/Developer/tools/logstash-6.2.4/logstash-core/lib/logstash/output_delegator.rb:49:in `multi_receive'", "/Users/casmeiron/Developer/tools/logstash-6.2.4/logstash-core/lib/logstash/pipeline.rb:477:in `block in output_batch'", "org/jruby/RubyHash.java:1343:in `each'", "/Users/casmeiron/Developer/tools/logstash-6.2.4/logstash-core/lib/logstash/pipeline.rb:476:in `output_batch'", "/Users/casmeiron/Developer/tools/logstash-6.2.4/logstash-core/lib/logstash/pipeline.rb:428:in `worker_loop'", "/Users/casmeiron/Developer/tools/logstash-6.2.4/logstash-core/lib/logstash/pipeline.rb:386:in `block in start_workers'"]}
[2018-05-17T16:44:18,035][ERROR][org.logstash.Logstash    ] java.lang.IllegalStateException: org.jruby.exceptions.RaiseException: (SystemExit) exit

If I start ElasticSearch and try the event again, everything works as expected. Is this the normal behaviour of logstash? Shouldn't' I be able to check [tags] to verify whether my event was successfully indexed?

Environment:
MacOSX Sierra (10.12.6)
java version "1.8.0"
Java(TM) SE Runtime Environment (build 1.8.0-b132)
Java HotSpot(TM) 64-Bit Server VM (build 25.0-b70, mixed mode)
Logstash 6.2.4

Thanks in advance!

@CasMeiron

There is a new version of the elasticsearch plugin that improves the behaviour in these kind of scenarios.

Try upgrading the elasticsearch plugin to the latest version -

bin/logstash-plugin update logstash-output-elasticsearch

See documentation for more details on upgrading plugins.

Thanks,

Rob

Hi, thanks for getting back to me.

I've tested the upgrade but few issues appeared:

  1. ElasticSearch output plugin doesn't add anything into [tags] even after it fails to index (all nodes are OFF). How to know if the index worked or failed? here's what I did:
output {
    if ![tags] {
        elasticsearch {
            hosts           => "${ES_HOSTS}"
            doc_as_upsert   => true
            document_id     => "%{[@metadata][fingerprint]}"
            index           => "teste"
        }    
    }

    if [tags] {
        stdout { } # never prints even though ES its OFF
    }
}
  1. It's impossible to stop Logstash after the attempt to index one message into Elasticsearch, I had to kill it, see below.
[2018-05-25T11:34:03,508][INFO ][logstash.outputs.elasticsearch] Running health check to see if an Elasticsearch connection is working {:healthcheck_url=>http://localhost:9200/, :path=>"/"}
[2018-05-25T11:34:03,512][WARN ][logstash.outputs.elasticsearch] Attempted to resurrect connection to dead ES instance, but got an error. {:url=>"http://localhost:9200/", :error_type=>LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError, :error=>"Elasticsearch Unreachable: [http://localhost:9200/][Manticore::SocketException] Connection refused"}
^C[2018-05-25T11:34:05,591][WARN ][logstash.runner          ] SIGINT received. Shutting down.
[2018-05-25T11:34:08,527][INFO ][logstash.outputs.elasticsearch] Running health check to see if an Elasticsearch connection is working {:healthcheck_url=>http://localhost:9200/, :path=>"/"}
[2018-05-25T11:34:08,530][WARN ][logstash.outputs.elasticsearch] Attempted to resurrect connection to dead ES instance, but got an error. {:url=>"http://localhost:9200/", :error_type=>LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError, :error=>"Elasticsearch Unreachable: [http://localhost:9200/][Manticore::SocketException] Connection refused"}
[2018-05-25T11:34:10,593][WARN ][logstash.runner          ] Received shutdown signal, but pipeline is still waiting for in-flight events
to be processed. Sending another ^C will force quit Logstash, but this may cause
data loss.
[2018-05-25T11:34:10,614][WARN ][logstash.shutdownwatcher ] {"inflight_count"=>8, "stalling_thread_info"=>{["LogStash::Filters::Mutate", {"remove_field"=>["message"], "id"=>"3dca6f301a96874f91e44fc656cfc92b03a1f4a5055fb8d4479c5ce9ddbaed2e"}]=>[{"thread_id"=>25, "name"=>nil, "current_call"=>"[...]/vendor/bundle/jruby/2.3.0/gems/logstash-output-elasticsearch-9.1.4-java/lib/logstash/outputs/elasticsearch/common.rb:36:in `sleep'"}, {"thread_id"=>26, "name"=>nil, "current_call"=>"[...]/vendor/bundle/jruby/2.3.0/gems/logstash-output-elasticsearch-9.1.4-java/lib/logstash/outputs/elasticsearch/common.rb:36:in `sleep'"}, {"thread_id"=>27, "name"=>nil, "current_call"=>"[...]/vendor/bundle/jruby/2.3.0/gems/logstash-output-elasticsearch-9.1.4-java/lib/logstash/outputs/elasticsearch/common.rb:36:in `sleep'"}, {"thread_id"=>28, "name"=>nil, "current_call"=>"[...]/vendor/bundle/jruby/2.3.0/gems/logstash-output-elasticsearch-9.1.4-java/lib/logstash/outputs/elasticsearch/common.rb:36:in `sleep'"}, {"thread_id"=>29, "name"=>nil, "current_call"=>"[...]/vendor/bundle/jruby/2.3.0/gems/logstash-output-elasticsearch-9.1.4-java/lib/logstash/outputs/elasticsearch/common.rb:36:in `sleep'"}, {"thread_id"=>30, "name"=>nil, "current_call"=>"[...]/vendor/bundle/jruby/2.3.0/gems/logstash-output-elasticsearch-9.1.4-java/lib/logstash/outputs/elasticsearch/common.rb:36:in `sleep'"}, {"thread_id"=>31, "name"=>nil, "current_call"=>"[...]/vendor/bundle/jruby/2.3.0/gems/logstash-output-elasticsearch-9.1.4-java/lib/logstash/outputs/elasticsearch/common.rb:36:in `sleep'"}, {"thread_id"=>32, "name"=>nil, "current_call"=>"[...]/vendor/bundle/jruby/2.3.0/gems/logstash-output-elasticsearch-9.1.4-java/lib/logstash/outputs/elasticsearch/common.rb:36:in `sleep'"}]}}
[2018-05-25T11:34:10,615][ERROR][logstash.shutdownwatcher ] The shutdown process appears to be stalled due to busy or blocked plugins. Check the logs for more information.
[2018-05-25T11:34:13,551][INFO ][logstash.outputs.elasticsearch] Running health check to see if an Elasticsearch connection is working {:healthcheck_url=>http://localhost:9200/, :path=>"/"}
[2018-05-25T11:34:13,554][WARN ][logstash.outputs.elasticsearch] Attempted to resurrect connection to dead ES instance, but got an error. {:url=>"http://localhost:9200/", :error_type=>LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError, :error=>"Elasticsearch Unreachable: [http://localhost:9200/][Manticore::SocketException] Connection refused"}
1[2018-05-25T11:34:15,616][WARN ][logstash.shutdownwatcher ] {"inflight_count"=>8, "stalling_thread_info"=>{["LogStash::Filters::Mutate", {"remove_field"=>["message"], "id"=>"3dca6f301a96874f91e44fc656cfc92b03a1f4a5055fb8d4479c5ce9ddbaed2e"}]=>[{"thread_id"=>25, "name"=>nil, "current_call"=>"[...]/vendor/bundle/jruby/2.3.0/gems/logstash-output-elasticsearch-9.1.4-java/lib/logstash/outputs/elasticsearch/common.rb:36:in `sleep'"}, {"thread_id"=>26, "name"=>nil, "current_call"=>"[...]/vendor/bundle/jruby/2.3.0/gems/logstash-output-elasticsearch-9.1.4-java/lib/logstash/outputs/elasticsearch/common.rb:36:in `sleep'"}, {"thread_id"=>27, "name"=>nil, "current_call"=>"[...]/vendor/bundle/jruby/2.3.0/gems/logstash-output-elasticsearch-9.1.4-java/lib/logstash/outputs/elasticsearch/common.rb:36:in `sleep'"}, {"thread_id"=>28, "name"=>nil, "current_call"=>"[...]/vendor/bundle/jruby/2.3.0/gems/logstash-output-elasticsearch-9.1.4-java/lib/logstash/outputs/elasticsearch/common.rb:36:in `sleep'"}, {"thread_id"=>29, "name"=>nil, "current_call"=>"[...]/vendor/bundle/jruby/2.3.0/gems/logstash-output-elasticsearch-9.1.4-java/lib/logstash/outputs/elasticsearch/common.rb:36:in `sleep'"}, {"thread_id"=>30, "name"=>nil, "current_call"=>"[...]/vendor/bundle/jruby/2.3.0/gems/logstash-output-elasticsearch-9.1.4-java/lib/logstash/outputs/elasticsearch/common.rb:36:in `sleep'"}, {"thread_id"=>31, "name"=>nil, "current_call"=>"[...]/vendor/bundle/jruby/2.3.0/gems/logstash-output-elasticsearch-9.1.4-java/lib/logstash/outputs/elasticsearch/common.rb:36:in `sleep'"}, {"thread_id"=>32, "name"=>nil, "current_call"=>"[...]/vendor/bundle/jruby/2.3.0/gems/logstash-output-elasticsearch-9.1.4-java/lib/logstash/outputs/elasticsearch/common.rb:36:in `sleep'"}]}}
^C[2018-05-25T11:34:17,039][FATAL][logstash.runner          ] SIGINT received. Terminating immediately..
[2018-05-25T11:34:17,173][ERROR][org.logstash.Logstash    ] org.jruby.exceptions.ThreadKill

Thanks.

@CasMeiron,

Logstash is intended to not lose data - if an event has been received by Logstash it will attempt to send that event to elasticsearch, and will not shutdown until that message has been sent successfully - in this case, until the elasticsearch instance is available to accept messages.

Hi Rob,

Thanks. So it's impossible to know, for now, if the index to elasticsearch failed? I understand that logstash won't shutdown 'till the received events are indexed. But if the machine reboots, will the transformed messages be applied to the outputs after Logstash runs again? How does that works?
As I'm using mqtt plugin with QOS=1, it should be better to just return the message/event (unack) to MQTT and then, when all outputs are available, Logstash tries to deliver it again. Bottom of line, shouldn't Logstash rejects messages if it sees that ElasticSearch is off and it's being used as output plugin? How to know when and how many messages/events are waiting in this status... it's at least obscure.

I'm just trying to figure out the best way to make sure Logstash doesn't lose data and send alarm when something wrong happened [zabbix output plugin]. Would be nice to know when the INDEX operation failed for some reason, it should add something into [tags].

Thanks.

@CasMeiron

If the indexing fails because elasticsearch is unavailable, then Logstash will queue up the message until the elasticsearch cluster is available again, but will drop the message if the machine reboots, for example, before the messages are indexed. For further resiliency, you might want to look into the persistent queue feature, which offers improved event durability in the event of a node crash/reboot.

Hi @RobBavey,

Thanks for detailing how logstash works. Unfortunately the persistence queue feature doesn't work when you have multiple pipelines configured, as mentioned here: Logstash with multiple pipeline doesn't process messages and that's my use case.

@CasMeiron

Looking at that issue, I suspect the problem you are having with the pq may be the value of the max_bytes setting. It is set to 1024, which translates to a maximum queue size of 1024 bytes. Try increasing it to something higher - the default value is 1024mb

1 Like

Thanks, that fixed my issue! Sorry for my silly mistake.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.