Data corruption after add/remove node to unicast cluster


(Steff) #1

Hi

We have made a simple test on rebalacing of shards.

Start-state:
One index with 3 shards (1 replica)
Two nodes running (having Node1, Node2 and Node3 in unicast list):
Node1 running primary of shard1, primary of shard2 and replica of
shard3
Node2 running primary of shard3, replica of shard1 and replica of
shard2

Action:
We start a new node (Node3 - also having Node1, Node2 and Node3 in its
unicast list) that joins the cluster

End-state (after rebalancing has finished)
Three nodes:
Node1 running primary of shard1 and primary of shard2
Node2 running primary of shard3
Node3 running replica of shard1, replica of shard2 and replica of
shard3
Basically ALL replicas have been moved to the new node.

Again (as in https://groups.google.com/group/elasticsearch/browse_thread/thread/232fdc4e560d41d)
we think that this is a very strange rebalancing of shards that ES
decided to do. But this time there where even bigger problems.

We did another action:
Stopped the new node (Node3) again.

Now rebalancing the replicas back to the remaining nodes (Node1 and
Node2) start. After a while the exception shown below occurs on one of
the remaining nodes, and afterwards the index has been corrupted. Now,
no matter what we do (restart etc.), the cluster will not "accept" the
index again. We never get "contact to" the index again and the data
can be considered lost - this would be very bad in production.

I notice the OutOfMemoryError, but really that shouldnt happen and
indeed, if it happens, it shouldnt corrupt the index/data for good.
Any ideas about what to do? Solutions? Comments?

Regards, Per Steffensen
------------- exception ----------------------------
[2011-10-14 09:43:52,113][WARN ][transport.netty ] [Sybil
Dorn] Exception caught on netty layer [[id: 0x5dc433a2, /
192.168.88.240:60385 => /192.168.88.241:9300]]
java.lang.OutOfMemoryError: Java heap space
[2011-10-14 09:43:52,114][WARN ][transport.netty ] [Sybil
Dorn] Exception caught on netty layer [[id: 0x5dc433a2, /
192.168.88.240:60385 => /192.168.88.241:9300]]
java.io.StreamCorruptedException: invalid data length: 0
at
org.elasticsearch.transport.netty.SizeHeaderFrameDecoder.decode(SizeHeaderFrameDecoder.java:
42)
at
org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:
282)
at
org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:
216)
at
org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:
80)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:
564)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline
$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:
783)
at
org.elasticsearch.common.netty.OpenChannelsHandler.handleUpstream(OpenChannelsHandler.java:
65)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:
564)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:
559)
at
org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:
274)
at
org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:
261)
at
org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:
349)
at
org.elasticsearch.common.netty.channel.socket.nio.NioWorker.processSelectedKeys(NioWorker.java:
280)
at
org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:
200)
at
org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:
108)
at org.elasticsearch.common.netty.util.internal.DeadLockProofWorker
$1.run(DeadLockProofWorker.java:44)
at java.util.concurrent.ThreadPoolExecutor
$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)


(Shay Banon) #2

Which version are you using? Do you still have the logs around for the test
run, and if so, can you gist / attach them? OOM should not cause data loss,
and the failure you posted seems like a communication problem that might
happen because of the OOM.

On Tue, Oct 18, 2011 at 3:11 PM, Steff steff@designware.dk wrote:

Hi

We have made a simple test on rebalacing of shards.

Start-state:
One index with 3 shards (1 replica)
Two nodes running (having Node1, Node2 and Node3 in unicast list):
Node1 running primary of shard1, primary of shard2 and replica of
shard3
Node2 running primary of shard3, replica of shard1 and replica of
shard2

Action:
We start a new node (Node3 - also having Node1, Node2 and Node3 in its
unicast list) that joins the cluster

End-state (after rebalancing has finished)
Three nodes:
Node1 running primary of shard1 and primary of shard2
Node2 running primary of shard3
Node3 running replica of shard1, replica of shard2 and replica of
shard3
Basically ALL replicas have been moved to the new node.

Again (as in
https://groups.google.com/group/elasticsearch/browse_thread/thread/232fdc4e560d41d
)
we think that this is a very strange rebalancing of shards that ES
decided to do. But this time there where even bigger problems.

We did another action:
Stopped the new node (Node3) again.

Now rebalancing the replicas back to the remaining nodes (Node1 and
Node2) start. After a while the exception shown below occurs on one of
the remaining nodes, and afterwards the index has been corrupted. Now,
no matter what we do (restart etc.), the cluster will not "accept" the
index again. We never get "contact to" the index again and the data
can be considered lost - this would be very bad in production.

I notice the OutOfMemoryError, but really that shouldnt happen and
indeed, if it happens, it shouldnt corrupt the index/data for good.
Any ideas about what to do? Solutions? Comments?

Regards, Per Steffensen
------------- exception ----------------------------
[2011-10-14 09:43:52,113][WARN ][transport.netty ] [Sybil
Dorn] Exception caught on netty layer [[id: 0x5dc433a2, /
192.168.88.240:60385 => /192.168.88.241:9300]]
java.lang.OutOfMemoryError: Java heap space
[2011-10-14 09:43:52,114][WARN ][transport.netty ] [Sybil
Dorn] Exception caught on netty layer [[id: 0x5dc433a2, /
192.168.88.240:60385 => /192.168.88.241:9300]]
java.io.StreamCorruptedException: invalid data length: 0
at

org.elasticsearch.transport.netty.SizeHeaderFrameDecoder.decode(SizeHeaderFrameDecoder.java:
42)
at

org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:
282)
at

org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:
216)
at

org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:
80)
at

org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:
564)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline
$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:
783)
at

org.elasticsearch.common.netty.OpenChannelsHandler.handleUpstream(OpenChannelsHandler.java:
65)
at

org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:
564)
at

org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:
559)
at

org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:
274)
at

org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:
261)
at

org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:
349)
at

org.elasticsearch.common.netty.channel.socket.nio.NioWorker.processSelectedKeys(NioWorker.java:
280)
at

org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:
200)
at

org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:
108)
at org.elasticsearch.common.netty.util.internal.DeadLockProofWorker
$1.run(DeadLockProofWorker.java:44)
at java.util.concurrent.ThreadPoolExecutor
$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)


(Steff) #3

On Oct 18, 8:19 pm, Shay Banon kim...@gmail.com wrote:

Which version are you using?

0.17.6

Do you still have the logs around for the test
run, and if so, can you gist / attach them?

Sorry, but they are lost.

OOM should not cause data loss,
and the failure you posted seems like a communication problem that might
happen because of the OOM.

Hopefully we will get around to repeat the test at some point in the
future, and I will make sure to collect logs and attach them here.

Regards, Per Steffensen


(Shay Banon) #4

0.17.6 might be culprit, as an OOM related fix went into 0.17.7.

On Mon, Oct 24, 2011 at 7:43 PM, Steff steff@designware.dk wrote:

On Oct 18, 8:19 pm, Shay Banon kim...@gmail.com wrote:

Which version are you using?

0.17.6

Do you still have the logs around for the test
run, and if so, can you gist / attach them?

Sorry, but they are lost.

OOM should not cause data loss,
and the failure you posted seems like a communication problem that might
happen because of the OOM.

Hopefully we will get around to repeat the test at some point in the
future, and I will make sure to collect logs and attach them here.

Regards, Per Steffensen


(Steff) #5

On 25 Okt., 00:32, Shay Banon kim...@gmail.com wrote:

0.17.6 might be culprit, as an OOM related fix went into 0.17.7.

Ok, we will consider upgrading when we are ready or when we rerun the
test and see the problem again.


(Steff) #6

On 24 Okt., 23:32, Shay Banon kim...@gmail.com wrote:

0.17.6 might be culprit, as an OOM related fix went into 0.17.7.

We plan to do a upgrade from 0.17.6 to 0.18.2 now. Are there any info
available about whether or not you can just do a software-upgrade
between those versions without having to consider configuration or
data already in 0.17.6. Put in other words, can I just stop all nodes
in my existing 0.17.6 cluster (already containing indices with data),
upgrade the version of ES installed on those nodes, copy data folders
and elasticsearch.yml, and then start all nodes running version 0.18.2
again, or are there maybe configuration entries that have been removed
(or changed semantics), or did the "data-format" change or stuff like
that?

Regards, Per Steffensen


(Shay Banon) #7

Yes, you can simply stop the nodes, use the new elasticsearch version, and
start it.

On Tue, Nov 1, 2011 at 12:59 PM, Steff steff@designware.dk wrote:

On 24 Okt., 23:32, Shay Banon kim...@gmail.com wrote:

0.17.6 might be culprit, as an OOM related fix went into 0.17.7.

We plan to do a upgrade from 0.17.6 to 0.18.2 now. Are there any info
available about whether or not you can just do a software-upgrade
between those versions without having to consider configuration or
data already in 0.17.6. Put in other words, can I just stop all nodes
in my existing 0.17.6 cluster (already containing indices with data),
upgrade the version of ES installed on those nodes, copy data folders
and elasticsearch.yml, and then start all nodes running version 0.18.2
again, or are there maybe configuration entries that have been removed
(or changed semantics), or did the "data-format" change or stuff like
that?

Regards, Per Steffensen


(system) #8