All of our indices have number_of_replicas 2, so every shard is
replicated to every node (we have 3 nodes and 3 copies of each shard).
So when a node goes down, no shards should be reallocated.
I can say for sure that the shutdown node is discovered and removed
immediately. Here is a segment of the log from one node when this
error occurs (this is the non-master node). The node is shutdown at
11:
[11:00:02,263][INFO ][cluster.service ] [Darkdevil] removed
{[Cassiopea][d51ef323-99c6-4f2d-b2ff-8b444294c2fa][inet[/
10.102.9.169:9300]],}, reason: zen-disco-receive(from [[Inertia]
[d83bddd4-a83f-45b5-863b-3aea1bf96722][inet[/10.102.19.139:9300]]])
[11:33:42,705][WARN ][monitor.jvm ] [Darkdevil] Long GC
collection occurred, took [10.6s], breached threshold [10s]
[11:34:05,341][WARN ][monitor.jvm ] [Darkdevil] Long GC
collection occurred, took [10s], breached threshold [10s]
[11:34:18,076][WARN ][monitor.jvm ] [Darkdevil] Long GC
collection occurred, took [12.3s], breached threshold [10s]
[11:34:31,183][WARN ][monitor.jvm ] [Darkdevil] Long GC
collection occurred, took [11s], breached threshold [10s]
[11:34:47,133][WARN ][monitor.jvm ] [Darkdevil] Long GC
collection occurred, took [15.6s], breached threshold [10s]
[11:36:11,656][WARN ][monitor.jvm ] [Darkdevil] Long GC
collection occurred, took [1.3m], breached threshold [10s]
[11:36:50,326][WARN ][transport.netty ] [Darkdevil] Exception
caught on netty layer [[id: 0x242110fc, /10.102.43.160:49037 => /
10.102.19.139:9300]]
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOfRange(Arrays.java:3221)
at java.lang.String.(String.java:233)
at
org.elasticsearch.common.io.stream.StreamInput.readUTF(StreamInput.java:
183)
at
org.elasticsearch.common.io.stream.HandlesStreamInput.readUTF(HandlesStreamInput.java:
49)
at org.elasticsearch.monitor.jvm.JvmInfo.readFrom(JvmInfo.java:
206)
at
org.elasticsearch.monitor.jvm.JvmInfo.readJvmInfo(JvmInfo.java:199)
at
org.elasticsearch.action.admin.cluster.node.info.NodeInfo.readFrom(NodeInfo.java:
207)
at
org.elasticsearch.transport.netty.MessageChannelHandler.handleResponse(MessageChannelHandler.java:
124)
at
org.elasticsearch.transport.netty.MessageChannelHandler.messageReceived(MessageChannelHandler.java:
102)
at
org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:
80)
I will try to get a heap dump and put it on s3 for you to download.
Stay tuned.
Grant
On Sep 15, 12:07 pm, Shay Banon shay.ba...@elasticsearch.com wrote:
Hi,
First a clarification. When you shut down the third node, shards that
were allocated on it will get reallocated on the other two nodes. This is
done to maintain the the required number of replicas parameter. For example,
if you have a 3 shards with 1 replica index, and you have 3 nodes, then each
node will have 2 shards. If you shutdown the third node, you will then have
2 nodes with 3 shards (and if you shutdown the second node, you will still
have 1 node with 3 shards, since it makes no sense to allocate a replica of
a shard on the same node).
What you see is very interesting. First, if after one hour you bring that
node back, then shards should get reallocated and you will end up again with
2 shards per node, so memory usage should certainly not increase. You say
that you see the increase 30 minutes after you killed the node. Can you
check the logs and see if this 30 minutes are the time it took to discover
the node got removed (because of long discovery fault detection settings)?
Those 30 minutes might also be the time it took to fully recover the
reallocated shards until they get into a fully operational mode. This will
help a bit to narrow down the problem.
If you want to know more on how the cluster looks like, you can check the
cluster state API, it will tell you which nodes are in the cluster, which
indices, and which shards are allocated on which node.
The best and quickest way for me to try and help is if you can issue a
heap dump when this spike happens (even for the first one). Its a bit of a
pain to get it, but thats the best way for me to check where the memory
problem is. To get a heap dump, you can use the latest visualvm (https://visualvm.dev.java.net/), run it against the JVM in question
(simplest is to run it on the same box, it will autodetect it), and issue a
heap dump. You can also use jmap with the -dump option (http://download.oracle.com/javase/6/docs/technotes/tools/share/jmap.html).
Once you have a heap dump, then comes the interesting bit. Tar gz it and
somehow provide me with a link to download it ;).
I would love to help solve this problem you are having. I am here to
help.
-shay.banon
On Wed, Sep 15, 2010 at 8:26 PM, Grant Rodgers gra...@gmail.com wrote:
Oops, just noticed there is an unfinished thought in the previous
message. It should be:
(the data gaps in the graphs for the nodes that were not stopped are
due to the elasticsearch jmx endpoint not responding)
On Sep 15, 11:17 am, Grant Rodgers gra...@gmail.com wrote:
I'm seeing strange behavior with my cluster of 3 nodes. It seems like
when one node drops out, the other nodes suddenly spike in cpu and
memory usage after a short time. We are using ES 0.10.0 with a 7GB
heap.
We have an intensive backup task on one node that causes high iowait,
and occasionally that node drops out of the cluster. To avoid split
brain, we started shutting that node down during the back up task.
This shouldn't cause any big activity elsewhere in the cluster since
all our shards are fully replicated on each node. The third node comes
back after an hour or so.
A short time after the node shuts down, the other two nodes suddenly
see a significant cpu and load spike, and memory usage increases
significantly. One node saw a 2GB jump, the other a 4GB jump.
Interestingly, the increases happen 20-30 minutes after the shutdown
node leaves the cluster, and appear to coincide with garbage
collection.
Here are munin graphs for elasticsearch memory usage to illustrate the
problem:
master node:http://i.imgur.com/8GTkb.png
non-master node that was not stopped:http://i.imgur.com/MIVPZ.png
non-master node that was stopped::http://i.imgur.com/jkuYg.png
(the gaps for the nodes that were not stopped are due to the munin
collector being unable to
As you can see in the graphs, this newly allocated memory is not
garbage collected. The next day when the node is shutdown again, the
accompanying memory spike hits the limit on both remaining nodes.
Additionally, one cpu is pegged at 100% for several hours probably due
to GC storms, but it is probably exacerbated by elasticsearch trying
to recover from hitting memory limits. Once the memory limit is hit
lots of buffer overrun and xml parsing exceptions start being thrown.
Shay, can you shed any light on this issue?
Thanks,
Grant