The restart node join cluster very slowly

foobarget · June 1, 2015, 6:22am

I have 3 nodes es cluster. Each node have 64G memory and 12 x 1T sata disk。The es version is 1.4.4

When I doing rolling restart, the restarted node join cluster very slowly even if I disable the cluster.routing.allocation.enable.

[2015-05-29 18:14:41,784][DEBUG][discovery.zen            ] [A] filtered ping responses: (filter_client[true], filter_data[false])
    --> ping_response{node [[C][r910JSFeSKmdAQGIsWQ0fA][test3][inet[/10.21.2.24:9300]]], id[99], master [[B][gf9EzizVQq6fgTraikW4ig][test2][inet
[/10.21.2.19:9300]]], hasJoinedOnce [true], cluster_name[elasticsearch]}
    --> ping_response{node [[B][gf9EzizVQq6fgTraikW4ig][test2][inet[/10.21.2.19:9300]]], id[262], master [[B][gf9EzizVQq6fgTraikW4ig][test2][ine
t[/10.21.2.19:9300]]], hasJoinedOnce [true], cluster_name[elasticsearch]}
[2015-05-29 18:14:41,790][DEBUG][transport.netty          ] [A] connected to node [[B][gf9EzizVQq6fgTraikW4ig][test2][inet[/10.21.2.19:9300]]]
[2015-05-29 18:15:08,769][WARN ][discovery                ] [A] waited for 30s and no initial state was set by the discovery
......
[2015-05-29 18:18:22,302][DEBUG][cluster.service          ] [A] processing [finalize_join ([B][gf9EzizVQq6fgTraikW4ig][test2][inet[/10.21.2.19:9
300]])]: execute

From the log, it taking 4 minutes to join cluster after several retrying

So I open the debug log。

When I kill the A node then start it. I found:
A node discovered the master node, but just like waiting for something and blocked.

this is B node's log

[2015-05-29 17:43:32,019][DEBUG][cluster.service          ] [B] set local cluster state to version 3810
[2015-05-29 17:49:12,923][DEBUG][discovery.zen.publish    ] [B] node [C][r910JSFeSKmdAQGIsWQ0fA][test3][inet[/xxx.xxx.xxx.xxx:9300]] responded for cl
uster state [3810] (took longer than [30s])

After the C responded the state 3810, then the A node join the cluster without any block。

Then I read the souce code and found the cluster may by block by the C node changeCluster state.
So I reproduce these stage and use jstack to dump the process stack 。

elasticsearch[231.189][clusterService#updateTask][T#1]" daemon prio=10 tid=0x00007f00a9b42000 nid=0x2f508 runnable [0x00007ef5f0e37000]
   java.lang.Thread.State: RUNNABLE
    at sun.nio.ch.FileDispatcherImpl.force0(Native Method)
    at sun.nio.ch.FileDispatcherImpl.force(FileDispatcherImpl.java:76)
    at sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:376)
    at org.elasticsearch.gateway.local.state.shards.LocalGatewayShardsState.writeShardState(LocalGatewayShardsState.java:294)
    at org.elasticsearch.gateway.local.state.shards.LocalGatewayShardsState.clusterChanged(LocalGatewayShardsState.java:153)
    at org.elasticsearch.gateway.local.LocalGateway.clusterChanged(LocalGateway.java:208)
    at org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:460)
    at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.j
ava:184)
    at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:154)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

The I use strace to attach the process
I see

[pid 193800] stat("/data1/logsearch/data/elasticsearch/nodes/0/indices/xxx-2015.05.13/4/_state", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
[pid 193800] open("/data1/logsearch/data/elasticsearch/nodes/0/indices/xxx-2015.05.13/4/_state/state-63", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 2530
[pid 193800] stat("/data2/logsearch/data/elasticsearch/nodes/0/indices/xxx-2015.05.13/4/_state", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
[pid 193800] open("/data2/logsearch/data/elasticsearch/nodes/0/indices/xxx-2015.05.13/4/_state/state-63", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 2530
[pid 193800] stat("/data3/logsearch/data/elasticsearch/nodes/0/indices/xxx-2015.05.13/4/_state", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
[pid 193800] open("/data3/logsearch/data/elasticsearch/nodes/0/indices/xxx-2015.05.13/4/_state/state-63", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 2530
......

What happened in the cluster? How can I speed up the cluster restart?