Receiving logstash error after installing watcher: SERVICE_UNAVAILABLE/1/state not recovered / initialized SERVICE_UNAVAILABLE/2/no master


(Darin Fisher) #1

Hi,

I installed Watcher on my ES cluster and started receiving this error from my logstash servers:

:timestamp=>"2015-10-21T10:56:48.613000-0700", :message=>"Failed to flush outgoing items", :outgoing_count=>37, :exception=>"Java::OrgElasticsearchClusterBlock::ClusterBlockException", :backtrace=>["org.elasticsearch.cluster.block.ClusterBlocks.globalBlockedException(org/elasticsearch/cluster/block/ClusterBlocks.java:151)", "org.elasticsearch.cluster.block.ClusterBlocks.globalBlockedRaiseException(org/elasticsearch/cluster/block/ClusterBlocks.java:141)", "org.elasticsearch.action.bulk.TransportBulkAction.executeBulk(org/elasticsearch/action/bulk/TransportBulkAction.java:215)", "org.elasticsearch.action.bulk.TransportBulkAction.access$000(org/elasticsearch/action/bulk/TransportBulkAction.java:67)", "org.elasticsearch.action.bulk.TransportBulkAction$1.onFailure(org/elasticsearch/action/bulk/TransportBulkAction.java:153)", "org.elasticsearch.action.support.TransportAction$ThreadedActionListener$2.run(org/elasticsearch/action/support/TransportAction.java:137)", "java.util.concurrent.ThreadPoolExecutor.runWorker(java/util/concurrent/ThreadPoolExecutor.java:1142)", "java.util.concurrent.ThreadPoolExecutor$Worker.run(java/util/concurrent/ThreadPoolExecutor.java:617)", "java.lang.Thread.run(java/lang/Thread.java:745)"], :level=>:warn}
{:timestamp=>"2015-10-21T10:57:49.637000-0700", :message=>"Got error to send bulk of actions: blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized];[SERVICE_UNAVAILABLE/2/no master];", :level=>:error}

There were not any corresponding errors in the ES logs.

I changed the Logstash ES output protocol to 'http' to fix the problem.

Why would I receive this error using the 'node' protocol?

Thank you.


(Magnus B├Ąck) #2

The ES client wasn't able to find a master node, most likely because it wasn't able to find the rest of the cluster. The cause of that situation can't be diagnosed without more information.


(Darin Fisher) #3

Each of the ES nodes, including the master, did show that it had joined.

What other information should I provide?

Thanks,
Darin


(Mark Walkom) #4

Dies _cat/nodes list the client?


(Darin Fisher) #5

Yes, it does show up as a node in the cluster.


(Darin Fisher) #6

Before starting logstash:

[dfisher@ops-2 ~]$ curl -s -XGET 'http://mon-esm-1:9200/_cat/nodes'
mon-esd-3.dc1.fm-hosted.com 10.1.108.111 35 29 0.15 d - mon-esd-3
mon-esd-2.dc1.fm-hosted.com 10.1.108.103 35 29 0.11 d - mon-esd-2
mon-esm-1.dc1.fm-hosted.com 10.1.108.101 18 9 0.02 d * mon-esm-1
mon-esd-1.dc1.fm-hosted.com 10.1.108.102 24 8 0.10 d - mon-esd-1

After:
[dfisher@ops-2 ~]$ curl -s -XGET 'http://mon-esm-1:9200/_cat/nodes'
mon-esd-3.dc1.fm-hosted.com 10.1.108.111 35 29 0.06 d - mon-esd-3
mon-esd-2.dc1.fm-hosted.com 10.1.108.103 35 29 0.10 d - mon-esd-2
mon-esm-1.dc1.fm-hosted.com 10.1.108.101 18 9 0.01 d * mon-esm-1
mon-esd-1.dc1.fm-hosted.com 10.1.108.102 24 8 0.51 d - mon-esd-1
mon-esp-1.dc1.fm-hosted.com 10.1.108.104 27 c - logstash-mon-esp-1.dc1.fm-hosted.com-2515-11638

Then just over 2 minutes after starting logstash the log shows the following:
{:timestamp=>"2015-10-22T12:48:12.712000-0700", :message=>"Got error to send bulk of actions: blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized];[SERVICE_UNAVAILABLE/2/no master];", :level=>:error}
{:timestamp=>"2015-10-22T12:48:12.715000-0700", :message=>"Failed to flush outgoing items", :outgoing_count=>2027, :exception=>"Java::OrgElasticsearchClusterBlock::ClusterBlockException", :backtrace=>["org.elasticsearch.cluster.block.ClusterBlocks.globalBlockedException(org/elasticsearch/cluster/block/ClusterBlocks.java:151)", "org.elasticsearch.cluster.block.ClusterBlocks.globalBlockedRaiseException(org/elasticsearch/cluster/block/ClusterBlocks.java:141)", "org.elasticsearch.action.bulk.TransportBulkAction.executeBulk(org/elasticsearch/action/bulk/TransportBulkAction.java:215)", "org.elasticsearch.action.bulk.TransportBulkAction.access$000(org/elasticsearch/action/bulk/TransportBulkAction.java:67)", "org.elasticsearch.action.bulk.TransportBulkAction$1.onFailure(org/elasticsearch/action/bulk/TransportBulkAction.java:153)", "org.elasticsearch.action.support.TransportAction$ThreadedActionListener$2.run(org/elasticsearch/action/support/TransportAction.java:137)", "java.util.concurrent.ThreadPoolExecutor.runWorker(java/util/concurrent/ThreadPoolExecutor.java:1142)", "java.util.concurrent.ThreadPoolExecutor$Worker.run(java/util/concurrent/ThreadPoolExecutor.java:617)", "java.lang.Thread.run(java/lang/Thread.java:745)"], :level=>:warn}
{:timestamp=>"2015-10-22T12:49:13.781000-0700", :message=>"Got error to send bulk of actions: blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized];[SERVICE_UNAVAILABLE/2/no master];", :level=>:error}
{:timestamp=>"2015-10-22T12:49:13.781000-0700", :message=>"Failed to flush outgoing items", :outgoing_count=>2027, :exception=>"Java::OrgElasticsearchClusterBlock::ClusterBlockException", :backtrace=>["org.elasticsearch.cluster.block.ClusterBlocks.globalBlockedException(org/elasticsearch/cluster/block/ClusterBlocks.java:151)", "org.elasticsearch.cluster.block.ClusterBlocks.globalBlockedRaiseException(org/elasticsearch/cluster/block/ClusterBlocks.java:141)", "org.elasticsearch.action.bulk.TransportBulkAction.executeBulk(org/elasticsearch/action/bulk/TransportBulkAction.java:215)", "org.elasticsearch.action.bulk.TransportBulkAction.access$000(org/elasticsearch/action/bulk/TransportBulkAction.java:67)", "org.elasticsearch.action.bulk.TransportBulkAction$1.onFailure(org/elasticsearch/action/bulk/TransportBulkAction.java:153)", "org.elasticsearch.action.support.TransportAction$ThreadedActionListener$2.run(org/elasticsearch/action/support/TransportAction.java:137)", "java.util.concurrent.ThreadPoolExecutor.runWorker(java/util/concurrent/ThreadPoolExecutor.java:1142)", "java.util.concurrent.ThreadPoolExecutor$Worker.run(java/util/concurrent/ThreadPoolExecutor.java:617)", "java.lang.Thread.run(java/lang/Thread.java:745)"], :level=>:warn}
{:timestamp=>"2015-10-22T12:50:14.858000-0700", :message=>"Got error to send bulk of actions: blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized];[SERVICE_UNAVAILABLE/2/no master];", :level=>:error}


(Mark Walkom) #7

Try switching to the http protocol in the LS output.


(Darin Fisher) #8

Yes, the HTTP output does work and that is what I have had to do at this point.
I would prefer the node protocol and am still "curious" as to why this is not working anymore.

The only change I made was to install watcher.


(Mark Walkom) #9

Ahh, then you need the license plugin jar on your LS node.

This sort of thing is why LS 2.0 will default to HTTP, it's easier and as fast as node/transport.


(Darin Fisher) #10

That's good to know.

Thank you so much for your help!


(system) #11