Hi,
Multiple times we ran into a problem where our search cluster was in an
inconsistent state. We have 3 nodes (all running 1.0.1), where nodes 2+3
hold the data (all the shards each, i.e. one replica per shard). Sometimes,
a long GC run happens on one of the nodes (here on node 3), causing it to
disconnect because the GC took longer than the timeout (here GC took 35.1s
and our timeout is currently 9s):
NODE 1
[2014-03-27 00:55:41,032][WARN ][discovery.zen ] [node1]
received cluster state from
[[node2][A45sMYqtQsGrwY5exK0sEg][node2][inet[/10.216.32.81:9300]]{master=true}]
which is also master but with an older cluster_state, telling
[[node2][A45sMYqtQsGrwY5exK0sEg][node2][inet[/10.216.32.81:9300]]{master=true}]
to rejoin the cluster
[2014-03-27 00:55:41,033][WARN ][discovery.zen ] [node1] failed
to send rejoin request to
[[node2][A45sMYqtQsGrwY5exK0sEg][node2][inet[/10.216.32.81:9300]]{master=true}]
org.elasticsearch.transport.SendRequestTransportException:
[node2][inet[/10.216.32.81:9300]][discovery/zen/rejoin]
at
org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:202)
at
org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:173)
at
org.elasticsearch.discovery.zen.ZenDiscovery$7.execute(ZenDiscovery.java:556)
at
org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:308)
at
org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:134)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
Caused by: org.elasticsearch.transport.NodeNotConnectedException:
[node2][inet[/10.216.32.81:9300]] Node not connected
at
org.elasticsearch.transport.netty.NettyTransport.nodeChannel(NettyTransport.java:859)
at
org.elasticsearch.transport.netty.NettyTransport.sendRequest(NettyTransport.java:540)
at
org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:189)
... 7 more
[2014-03-27 01:54:45,722][WARN ][discovery.zen ] [node1]
received cluster state from
[[node2][A45sMYqtQsGrwY5exK0sEg][node2][inet[/10.216.32.81:9300]]{master=true}]
which is also master but with an older cluster_state, telling
[[node2][A45sMYqtQsGrwY5exK0sEg][node2][inet[/10.216.32.81:9300]]{master=true}]
to rejoin the cluster
[2014-03-27 01:54:45,723][WARN ][discovery.zen ] [node1] failed
to send rejoin request to
[[node2][A45sMYqtQsGrwY5exK0sEg][node2][inet[/10.216.32.81:9300]]{master=true}]
org.elasticsearch.transport.SendRequestTransportException:
[node2][inet[/10.216.32.81:9300]][discovery/zen/rejoin]
at
org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:202)
at
org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:173)
at
org.elasticsearch.discovery.zen.ZenDiscovery$7.execute(ZenDiscovery.java:556)
at
org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:308)
at
org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:134)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
Caused by: org.elasticsearch.transport.NodeNotConnectedException:
[node2][inet[/10.216.32.81:9300]] Node not connected
at
org.elasticsearch.transport.netty.NettyTransport.nodeChannel(NettyTransport.java:859)
at
org.elasticsearch.transport.netty.NettyTransport.sendRequest(NettyTransport.java:540)
at
org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:189)
... 7 more
[2014-03-27 07:19:02,889][WARN ][discovery.zen ] [node1]
received cluster state from
[[node2][A45sMYqtQsGrwY5exK0sEg][node2][inet[/10.216.32.81:9300]]{master=true}]
which is also master but with an older cluster_state, telling
[[node2][A45sMYqtQsGrwY5exK0sEg][node2][inet[/10.216.32.81:9300]]{master=true}]
to rejoin the cluster
[2014-03-27 07:19:02,889][WARN ][discovery.zen ] [node1] failed
to send rejoin request to
[[node2][A45sMYqtQsGrwY5exK0sEg][node2][inet[/10.216.32.81:9300]]{master=true}]
org.elasticsearch.transport.SendRequestTransportException:
[node2][inet[/10.216.32.81:9300]][discovery/zen/rejoin]
at
org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:202)
at
org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:173)
at
org.elasticsearch.discovery.zen.ZenDiscovery$7.execute(ZenDiscovery.java:556)
at
org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:308)
at
org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:134)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
Caused by: org.elasticsearch.transport.NodeNotConnectedException:
[node2][inet[/10.216.32.81:9300]] Node not connected
at
org.elasticsearch.transport.netty.NettyTransport.nodeChannel(NettyTransport.java:859)
at
org.elasticsearch.transport.netty.NettyTransport.sendRequest(NettyTransport.java:540)
at
org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:189)
... 7 more
NODE 2
[2014-03-27 07:19:02,871][INFO ][cluster.service ] [node2] removed
{[node3][RRqWlTWnQ7ygvsOaJS0_mA][node3][inet[/10.235.38.84:9300]]{master=true},},
reason:
zen-disco-node_failed([node3][RRqWlTWnQ7ygvsOaJS0_mA][node3][inet[/10.235.38.84:9
300]]{master=true}), reason failed to ping, tried [2] times, each with
maximum [9s] timeout
NODE 3
[2014-03-27 07:19:20,055][WARN ][monitor.jvm ] [node3]
[gc][old][539697][754] duration [35.1s], collections [1]/[35.8s], total
[35.1s]/[2.7m], memory [4.9gb]->[4.2gb]/[7.9gb], all_pools {[young]
[237.8mb]->[7.4mb]/[266.2mb]}{[survivor] [25.5mb]->[0b]/[33
.2mb]}{[old] [4.6gb]->[4.2gb]/[7.6gb]}
[2014-03-27 07:19:20,112][INFO ][discovery.zen ] [node3]
master_left
[[node2][A45sMYqtQsGrwY5exK0sEg][node2][inet[/10.216.32.81:9300]]{master=true}],
reason [do not exists on master, act as master failure]
[2014-03-27 07:19:20,117][INFO ][cluster.service ] [node3] master
{new
[node1][DxlcpaqOTmmpNSRoqt1sZg][node1.example][inet[/10.252.78.88:9300]]{data=false,
master=true}, previous
[node2][A45sMYqtQsGrwY5exK0sEg][node2][inet[/10.216.32.81:9300
]]{master=true}}, removed
{[node2][A45sMYqtQsGrwY5exK0sEg][node2][inet[/10.216.32.81:9300]]{master=true},},
reason: zen-disco-master_failed
([node2][A45sMYqtQsGrwY5exK0sEg][node2][inet[/10.216.32.81:9300]]{master=true})
After this scenario, the cluster doesn't recover properly: The worst thing
is that node 1 sees nodes 1+3, node 2 sees nodes 1+2 and node 3 sees nodes
1+3. Since the cluster is set up to operate with two nodes, both data nodes
2 and 3 accept data and searches, causing inconsistent results and
requiring us to do a full cluster restart and reindex all production data
to make sure the cluster is consistent again.
NODE 1 (GET /_nodes):
{
"cluster_name" : "elasticsearch",
"nodes" : {
"DxlcpaqOTmmpNSRoqt1sZg" : {
"name" : "node1",
...
},
"RRqWlTWnQ7ygvsOaJS0_mA" : {
"name" : "node3",
...
}
}
}
NODE 2 (GET /_nodes):
{
"cluster_name" : "elasticsearch",
"nodes" : {
"A45sMYqtQsGrwY5exK0sEg" : {
"name" : "node2",
...
},
"DxlcpaqOTmmpNSRoqt1sZg" : {
"name" : "node1",
...
}
}
}
NODE 3 (GET /_nodes):
{
"cluster_name" : "elasticsearch",
"nodes" : {
"DxlcpaqOTmmpNSRoqt1sZg" : {
"name" : "node1",
...
},
"RRqWlTWnQ7ygvsOaJS0_mA" : {
"name" : "node3",
...
}
}
}
Here are the configurations:
BASE CONFIG (for all nodes):
action:
disable_delete_all_indices: true
discovery:
zen:
fd:
ping_retries: 2
ping_timeout: 9s
minimum_master_nodes: 2
ping:
multicast:
enabled: false
unicast:
hosts: ["node1.example", "node2.example", "node3.example"]
index:
fielddata:
cache: node
indices:
fielddata:
cache:
size: 40%
memory:
index_buffer_size: 20%
threadpool:
bulk:
queue_size: 100
type: fixed
transport:
tcp:
connect_timeout: 3s
NODE 1:
node:
data: false
master: true
name: node1
NODE 2:
node:
data: true
master: true
name: node2
NODE 3:
node:
data: true
master: true
name: node3
Questions:
- What can we do to minimize long GC runs, so the nodes don't become
unresponsive and disconnect in the first place? (FYI: Our index is
currently about 80 GB in size with over 2M docs (per node), 60 shards, heap
size 8 GB. We run both searches and aggregations on it.) - Obviously, having the cluster state in a state like the above is
unacceptable and we therefore want to make sure that even if a node
disconnects because of GC, the cluster can fully recover and only one of
the two data nodes can accept data and searches while a node is
disconnected. Is there anything that needs to be changed in the
Elasticsearch code to fix this issue?
Thanks,
Thomas
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1d04f9e9-541d-4440-b874-143564c6ecdb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.