ES2.0 globalBlockedException


(Sukrit Dasgupta) #1

Trying to send LS2.0 output to ES2.0 seeing cluster globalBlocked exceptions of these kinds:

    [2015-11-20 12:27:54,436][INFO ][rest.suppressed          ] /_template/logstash Params: {name=logstash}
    ClusterBlockException[blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized];]
            at org.elasticsearch.cluster.block.ClusterBlocks.globalBlockedException(ClusterBlocks.java:154)
            at org.elasticsearch.action.admin.indices.template.get.TransportGetIndexTemplatesAction.checkBlock(TransportGetIndexTemplatesAction.java:58)
            at org.elasticsearch.action.admin.indices.template.get.TransportGetIndexTemplatesAction.checkBlock(TransportGetIndexTemplatesAction.java:43)
            at org.elasticsearch.action.support.master.TransportMasterNodeAction.innerExecute(TransportMasterNodeAction.java:94)
            at org.elasticsearch.action.support.master.TransportMasterNodeAction.doExecute(TransportMasterNodeAction.java:86)
            at org.elasticsearch.action.support.master.TransportMasterNodeAction.doExecute(TransportMasterNodeAction.java:48)
            at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:70)
            at org.elasticsearch.client.node.NodeClient.doExecute(NodeClient.java:58)
            at org.elasticsearch.client.support.AbstractClient.execute(AbstractClient.java:347)
            at org.elasticsearch.client.FilterClient.doExecute(FilterClient.java:52)
...  
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

And then

[2015-11-20 12:28:55,530][INFO ][rest.suppressed          ] /_bulk Params: {}
ClusterBlockException[blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized];]
        at org.elasticsearch.cluster.block.ClusterBlocks.globalBlockedException(ClusterBlocks.java:154)
        at org.elasticsearch.cluster.block.ClusterBlocks.globalBlockedRaiseException(ClusterBlocks.java:144)
        at org.elasticsearch.action.bulk.TransportBulkAction.executeBulk(TransportBulkAction.java:207)
        at org.elasticsearch.action.bulk.TransportBulkAction.access$000(TransportBulkAction.java:66)
        at org.elasticsearch.action.bulk.TransportBulkAction$1.onFailure(TransportBulkAction.java:145)
        at org.elasticsearch.action.support.ThreadedActionListener$2.doRun(ThreadedActionListener.java:104)
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

My config on the ES2.0

cluster.name: live
node.name: srv-9
path.data: /home/s/elastic-data
path.logs: /home/s/elastic-logs
bootstrap.mlockall: true
gateway.recover_after_nodes: 8

discovery.zen.ping.unicast.hosts: ["10.86.205.55:9300", "10.86.205.60:9300"]
discovery.zen.minimum_master_nodes: 1
discovery.zen.fd.ping_timeout: 30s
discovery.zen.ping.multicast.enabled: false

action.disable_delete_all_indices: true

index.number_of_shards: 20
index.number_of_replicas: 3

network.bind_host: "0.0.0.0"
network.publish_host: _non_loopback:ipv4_

Logstash2 output config:

output {
 elasticsearch { 
 hosts => ["10.86.205.62:9200"] 
}
 stdout { codec => rubydebug }
}

Any thoughts or pointers?

I have tried restarting both ES and LS. It seems like LS is sending a big burst of messages. Thats because its reading from Redis which already has several messages on store when LS is coming up.

Thanks


(Mark Walkom) #2

What state is your ES cluster in?


(Sukrit Dasgupta) #3

Hi @warkolm, since my posting, I have removed the other hosts from the cluster and trying to run this standalone with one master/data. Still see the same issue. This is what I see:

{
  "cluster_name" : "live",
  "status" : "red",
  "timed_out" : false,
  "number_of_nodes" : 1,
  "number_of_data_nodes" : 1,
  "active_primary_shards" : 0,
  "active_shards" : 0,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : "NaN"
}

and 'pending tasks' shows:

{
"error" : {
"root_cause" : [ {
"type" : "cluster_block_exception",
"reason" : "blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized];"
} ],
"type" : "cluster_block_exception",
"reason" : "blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized];"
},
"status" : 503
}

Thanks!


(Mark Walkom) #4

It's pretty weird that it's red with no shards.
Is there data on disk anywhere?


(Sukrit Dasgupta) #5

No data anywhere. This is a new setup where I am trying to bring up LS2 talking to ES2.

LS2 is getting data from Redis and as soon as LS2 comes up, it seems to send a huge burst of data to ES2, which ends up getting stuck with this exception. LS2 eventually complains its pipeline is full.

Thanks.


(Mark Walkom) #6

I'd stop everything and then bring ES up, make sure it is in a green state and then create some test indices and see what happens.


(Sukrit Dasgupta) #7

Was getting the same issue after shutting everything down and bringing up ES2. Removing the following config got this to work correctly.

#discovery.zen.minimum_master_nodes: 1
#discovery.zen.fd.ping_timeout: 30s
#index.number_of_shards: 20
#index.number_of_replicas: 3

I guess it kind of makes sense that there should be no replicas or multiple shards in a one machine show. Dont know whether this is really the cause of whether there should be some error handling of the config. Just a lucky guess at this point from my side.

Thanks again!


(system) #8