UnavailableShardsException when adding document if number of nodes < number of replicas


(Creg Bradley) #1

Ordinarily, we would always have more nodes than replicas. However, in
failure scenarios, if a number of nodes go down such that there are more
replicas than nodes then we start getting the following exception:

{
"error":"UnavailableShardsException[[toomanyreplicas][4] [3] shardIt, [1]
active : Timeout waiting for [1m], request: index
{[toomanyreplicas][testdocument][QegOmpSTQZePZ1IaFeEmag], source[\n{\n
"fileAttachment" : "ZmlnaHRpbmc="\n}\n]}]",
"status":503
}

This is running the following gist:

Is this a failure scenario that should be supported or do we need to try
and disable document add if the cluster gets into this state? Detecting
when we've reached this state seems like it will be problematic.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4aa45093-4251-42d4-a79a-2723e258e839%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Alexander Reelsen) #2

Hey,

can you describe the failure in more detail? Do you have failing all the
nodes which hold a copy of specific shard, so you have data loss (you can
check with the cluster state API, if the cluster state is red, as opposed
to yellow or green, where you can index d).

In case of a red cluster state those index requests are rejected anyway...
but it might make sense to reflect that state in your application.

--Alex

On Tue, Dec 3, 2013 at 8:23 PM, Creg Bradley creg.bradley@gmail.com wrote:

Ordinarily, we would always have more nodes than replicas. However, in
failure scenarios, if a number of nodes go down such that there are more
replicas than nodes then we start getting the following exception:

{
"error":"UnavailableShardsException[[toomanyreplicas][4] [3] shardIt,
[1] active : Timeout waiting for [1m], request: index
{[toomanyreplicas][testdocument][QegOmpSTQZePZ1IaFeEmag], source[\n{\n
"fileAttachment" : "ZmlnaHRpbmc="\n}\n]}]",
"status":503
}

This is running the following gist:
https://gist.github.com/creg/7775457

Is this a failure scenario that should be supported or do we need to try
and disable document add if the cluster gets into this state? Detecting
when we've reached this state seems like it will be problematic.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/4aa45093-4251-42d4-a79a-2723e258e839%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGCwEM9UChY-8RV%3DQb4gaPLCzLHqSKc-nnFNTsDXgen37zb7sQ%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Creg Bradley) #3

Well, the Gist repros the problem. The result is that you have one node
that has 5 master shards and 10 backups, 2 per master shard. No failure
case is necessary. I only brought up failure modes because having multiple
replicas of the same shard on the same node is stupid, but could happen if
the right number of nodes fail.

The cluster is green in kopf when in this state.

On Wednesday, December 4, 2013 12:04:43 AM UTC-8, Alexander Reelsen wrote:

Hey,

can you describe the failure in more detail? Do you have failing all the
nodes which hold a copy of specific shard, so you have data loss (you can
check with the cluster state API, if the cluster state is red, as opposed
to yellow or green, where you can index d).

In case of a red cluster state those index requests are rejected anyway...
but it might make sense to reflect that state in your application.

--Alex

On Tue, Dec 3, 2013 at 8:23 PM, Creg Bradley <creg.b...@gmail.com<javascript:>

wrote:

Ordinarily, we would always have more nodes than replicas. However, in
failure scenarios, if a number of nodes go down such that there are more
replicas than nodes then we start getting the following exception:

{
"error":"UnavailableShardsException[[toomanyreplicas][4] [3] shardIt,
[1] active : Timeout waiting for [1m], request: index
{[toomanyreplicas][testdocument][QegOmpSTQZePZ1IaFeEmag], source[\n{\n
"fileAttachment" : "ZmlnaHRpbmc="\n}\n]}]",
"status":503
}

This is running the following gist:
https://gist.github.com/creg/7775457

Is this a failure scenario that should be supported or do we need to try
and disable document add if the cluster gets into this state? Detecting
when we've reached this state seems like it will be problematic.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/4aa45093-4251-42d4-a79a-2723e258e839%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5013cc0e-ad50-4848-a5de-37e4211a8d76%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #4