Unassigned shard out of nowhere - restarting the node makes everything green again

We are running a 2 node es cluster with 5 indexes, each with 5 shards and 1
replica
version 0.20.5

Everything is working fine, but twice now I have had a single shard become
unassigned. The only thing I can do to get it back into the cluster is to
restart the node that is now missing a shard.

All I see in the log that looks like it could be related is:

[2013-05-14 02:17:03,214][WARN ][cluster.action.shard ] [Caregiver]
received shard failed for [shares_20130510][2],
node[EVw9ssCnQWm4u9qUgpln_g], [R], s[STARTED], reason [Failed to perform
[index] on replica, message [RemoteTransportException[Failed to deserialize
exception response from stream]; nested:
TransportSerializationException[Failed to deserialize exception response
from stream]; nested: EOFException; ]]

but I don't really know what that means, and it's supposedly only a warning.

Is there a way to prevent this from happening and is there a way to get the
cluster to assign the shard besides restarting the whole node and waiting
for replcation / rebalance?

Thanks,
--chad

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

This sounds like you have different versions of ES (node or client) running
in the same cluster. If not that, check that you're using the same version
of Java on all boxes

clint

On 15 May 2013 05:18, Chad Kouse chad.kouse@gmail.com wrote:

We are running a 2 node es cluster with 5 indexes, each with 5 shards and
1 replica
version 0.20.5

Everything is working fine, but twice now I have had a single shard become
unassigned. The only thing I can do to get it back into the cluster is to
restart the node that is now missing a shard.

All I see in the log that looks like it could be related is:

[2013-05-14 02:17:03,214][WARN ][cluster.action.shard ] [Caregiver]
received shard failed for [shares_20130510][2],
node[EVw9ssCnQWm4u9qUgpln_g], [R], s[STARTED], reason [Failed to perform
[index] on replica, message [RemoteTransportException[Failed to deserialize
exception response from stream]; nested:
TransportSerializationException[Failed to deserialize exception response
from stream]; nested: EOFException; ]]

but I don't really know what that means, and it's supposedly only a
warning.

Is there a way to prevent this from happening and is there a way to get
the cluster to assign the shard besides restarting the whole node and
waiting for replcation / rebalance?

Thanks,
--chad

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

ES version is the same, however there is a difference in the java version:

node 1:
java version "1.6.0_20"
OpenJDK Runtime Environment (IcedTea6 1.9.10) (6b20-1.9.10-0ubuntu1~10.04.3)
OpenJDK 64-Bit Server VM (build 19.0-b09, mixed mode)

node 2:
java version "1.6.0_27"
OpenJDK Runtime Environment (IcedTea6 1.12.3) (6b27-1.12.3-0ubuntu1~10.04.1)
OpenJDK 64-Bit Server VM (build 20.0-b12, mixed mode)

I will get them updated to be the same and report back if I have any more trouble with this happening.

-- chad

On Wednesday, May 15, 2013 at 6:22 AM, Clinton Gormley wrote:

This sounds like you have different versions of ES (node or client) running in the same cluster. If not that, check that you're using the same version of Java on all boxes

clint

On 15 May 2013 05:18, Chad Kouse <chad.kouse@gmail.com (mailto:chad.kouse@gmail.com)> wrote:

We are running a 2 node es cluster with 5 indexes, each with 5 shards and 1 replica
version 0.20.5

Everything is working fine, but twice now I have had a single shard become unassigned. The only thing I can do to get it back into the cluster is to restart the node that is now missing a shard.

All I see in the log that looks like it could be related is:

[2013-05-14 02:17:03,214][WARN ][cluster.action.shard ] [Caregiver] received shard failed for [shares_20130510][2], node[EVw9ssCnQWm4u9qUgpln_g], [R], s[STARTED], reason [Failed to perform [index] on replica, message [RemoteTransportException[Failed to deserialize exception response from stream]; nested: TransportSerializationException[Failed to deserialize exception response from stream]; nested: EOFException; ]]

but I don't really know what that means, and it's supposedly only a warning.

Is there a way to prevent this from happening and is there a way to get the cluster to assign the shard besides restarting the whole node and waiting for replcation / rebalance?

Thanks,
--chad

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com (mailto:elasticsearch%2Bunsubscribe@googlegroups.com).
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/ZfLg3ZRkZN8/unsubscribe?hl=en-US.
To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscribe@googlegroups.com (mailto:elasticsearch+unsubscribe@googlegroups.com).
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Those versions are pretty. You should look at upgrading them to a recent
Java 7 from Oracle.

clint

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

pretty old

On 15 May 2013 17:16, Clinton Gormley clint@traveljury.com wrote:

Those versions are pretty. You should look at upgrading them to a recent
Java 7 from Oracle.

clint

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.