Unassigned primary *and* replica shards

Greetings, I have an 8 node cluster with 4 nodes each in different data
centers. Recently the cluster became partitioned (i.e. couldn't see each
other) due to firewalls.

I have removed the firewall issue and all nodes are once again showing to
be part of the cluster, but there are 8 indexes, which show pairs of
unassigned primary and replica shards (so shard 3 primary and replica of
the same index for example).

{
"cluster_name" : "xxxxxxxx",
"status" : "red",
"timed_out" : false,
"number_of_nodes" : 8,
"number_of_data_nodes" : 8,
"active_primary_shards" : 500,
"active_shards" : 1000,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 16
}

I've looked into the reroute APIhttp://www.elasticsearch.org/guide/reference/api/admin-cluster-reroute.html and
tried the following:

curl XPOST -s 'http://localhost:9200/_cluster/reroute?pretty=true' -d '{
"commands" : [ { "allocate" : {
"index" : "0db5bb70b5d19bfceb5cebd5878fbcb905787745", "shard"
: 3 , "node" : "_9FezNn4TgWjftFeKBAMJw", "allow_primary" : 1 }
}
]
}'

which seems to return, but fails to allocate the shard. If I leave the
"allow_primary" off I get an error as I would expect from reading the above
API notes.

The question is this:

How do I recover from this failure and assign the unassigned shards which
are in pairs of primary and replica?

Any help would be greatly appreciated.

@matthias

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

I should also mention that we are running version: 0.20.2

@matthias

On Friday, March 22, 2013 10:56:36 AM UTC-6, Matthias Johnson wrote:

Greetings, I have an 8 node cluster with 4 nodes each in different data
centers. Recently the cluster became partitioned (i.e. couldn't see each
other) due to firewalls.

I have removed the firewall issue and all nodes are once again showing to
be part of the cluster, but there are 8 indexes, which show pairs of
unassigned primary and replica shards (so shard 3 primary and replica of
the same index for example).

{
"cluster_name" : "xxxxxxxx",
"status" : "red",
"timed_out" : false,
"number_of_nodes" : 8,
"number_of_data_nodes" : 8,
"active_primary_shards" : 500,
"active_shards" : 1000,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 16
}

I've looked into the reroute APIhttp://www.elasticsearch.org/guide/reference/api/admin-cluster-reroute.html and
tried the following:

curl XPOST -s 'http://localhost:9200/_cluster/reroute?pretty=true' -d '{
"commands" : [ { "allocate" : {
"index" : "0db5bb70b5d19bfceb5cebd5878fbcb905787745",
"shard" : 3 , "node" : "_9FezNn4TgWjftFeKBAMJw", "allow_primary" : 1 }
}
]
}'

which seems to return, but fails to allocate the shard. If I leave the
"allow_primary" off I get an error as I would expect from reading the above
API notes.

The question is this:

How do I recover from this failure and assign the unassigned shards which
are in pairs of primary and replica?

Any help would be greatly appreciated.

@matthias

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Well, I managed to get our cluster back to all green status.

I did this by leaning on the karmi's excellent tire libraryhttps://github.com/karmi/tire
.

Essentially I re-indexed the indexes in error to a new one, deleting the
old one, re-index back to the original name.

Here is the super brief ruby snippet:

#!/usr/bin/ruby

require 'rubygems'
require 'tire'

Tire.configure do
url "http://localhost:9200"
end

Tire.index('broken').reindex 'broken-recover'

When that finished and I was sure it looked good I simply modified the code
to reverse the index names and re-ran.

I'm not sure if this is the most graceful solution, but it seemed to work
in this case, with vary small indexes (i.e. just recently created).

Cheers,

@matthias

On Friday, March 22, 2013 10:56:36 AM UTC-6, Matthias Johnson wrote:

Greetings, I have an 8 node cluster with 4 nodes each in different data
centers. Recently the cluster became partitioned (i.e. couldn't see each
other) due to firewalls.

I have removed the firewall issue and all nodes are once again showing to
be part of the cluster, but there are 8 indexes, which show pairs of
unassigned primary and replica shards (so shard 3 primary and replica of
the same index for example).

{
"cluster_name" : "xxxxxxxx",
"status" : "red",
"timed_out" : false,
"number_of_nodes" : 8,
"number_of_data_nodes" : 8,
"active_primary_shards" : 500,
"active_shards" : 1000,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 16
}

I've looked into the reroute APIhttp://www.elasticsearch.org/guide/reference/api/admin-cluster-reroute.html and
tried the following:

curl XPOST -s 'http://localhost:9200/_cluster/reroute?pretty=true' -d '{
"commands" : [ { "allocate" : {
"index" : "0db5bb70b5d19bfceb5cebd5878fbcb905787745",
"shard" : 3 , "node" : "_9FezNn4TgWjftFeKBAMJw", "allow_primary" : 1 }
}
]
}'

which seems to return, but fails to allocate the shard. If I leave the
"allow_primary" off I get an error as I would expect from reading the above
API notes.

The question is this:

How do I recover from this failure and assign the unassigned shards which
are in pairs of primary and replica?

Any help would be greatly appreciated.

@matthias

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

You can use aliases to skip that last re-index, we're using that a lot
(we add timestamps to indexnames to indicate which index was latest).
Using the alias you can even swap from the old to the new and revert
back if needed.
Jaap Taal

[ Q42 BV | tel 070 44523 42 | direct 070 44523 65 | http://q42.nl |
Waldorpstraat 17F, Den Haag | Vijzelstraat 72 unit 4.23, Amsterdam |
KvK 30164662 ]

On Mon, Mar 25, 2013 at 5:17 PM, Matthias Johnson opennomad@gmail.com wrote:

Well, I managed to get our cluster back to all green status.

I did this by leaning on the karmi's excellent tire library.

Essentially I re-indexed the indexes in error to a new one, deleting the old
one, re-index back to the original name.

Here is the super brief ruby snippet:

#!/usr/bin/ruby

require 'rubygems'
require 'tire'

Tire.configure do
url "http://localhost:9200"
end

Tire.index('broken').reindex 'broken-recover'

When that finished and I was sure it looked good I simply modified the code
to reverse the index names and re-ran.

I'm not sure if this is the most graceful solution, but it seemed to work in
this case, with vary small indexes (i.e. just recently created).

Cheers,

@matthias

On Friday, March 22, 2013 10:56:36 AM UTC-6, Matthias Johnson wrote:

Greetings, I have an 8 node cluster with 4 nodes each in different data
centers. Recently the cluster became partitioned (i.e. couldn't see each
other) due to firewalls.

I have removed the firewall issue and all nodes are once again showing to
be part of the cluster, but there are 8 indexes, which show pairs of
unassigned primary and replica shards (so shard 3 primary and replica of the
same index for example).

{
"cluster_name" : "xxxxxxxx",
"status" : "red",
"timed_out" : false,
"number_of_nodes" : 8,
"number_of_data_nodes" : 8,
"active_primary_shards" : 500,
"active_shards" : 1000,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 16
}

I've looked into the reroute API and tried the following:

curl XPOST -s 'http://localhost:9200/_cluster/reroute?pretty=true' -d '{
"commands" : [ { "allocate" : {
"index" : "0db5bb70b5d19bfceb5cebd5878fbcb905787745",
"shard" : 3 , "node" : "_9FezNn4TgWjftFeKBAMJw", "allow_primary" : 1 }
}
]
}'

which seems to return, but fails to allocate the shard. If I leave the
"allow_primary" off I get an error as I would expect from reading the above
API notes.

The question is this:

How do I recover from this failure and assign the unassigned shards which
are in pairs of primary and replica?

Any help would be greatly appreciated.

@matthias

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Jaap, thanks for that idea. I'll keep that in mind for the future!

@matthias

On Friday, March 22, 2013 10:56:36 AM UTC-6, Matthias Johnson wrote:

Greetings, I have an 8 node cluster with 4 nodes each in different data
centers. Recently the cluster became partitioned (i.e. couldn't see each
other) due to firewalls.

I have removed the firewall issue and all nodes are once again showing to
be part of the cluster, but there are 8 indexes, which show pairs of
unassigned primary and replica shards (so shard 3 primary and replica of
the same index for example).

{
"cluster_name" : "xxxxxxxx",
"status" : "red",
"timed_out" : false,
"number_of_nodes" : 8,
"number_of_data_nodes" : 8,
"active_primary_shards" : 500,
"active_shards" : 1000,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 16
}

I've looked into the reroute APIhttp://www.elasticsearch.org/guide/reference/api/admin-cluster-reroute.html and
tried the following:

curl XPOST -s 'http://localhost:9200/_cluster/reroute?pretty=true' -d '{
"commands" : [ { "allocate" : {
"index" : "0db5bb70b5d19bfceb5cebd5878fbcb905787745",
"shard" : 3 , "node" : "_9FezNn4TgWjftFeKBAMJw", "allow_primary" : 1 }
}
]
}'

which seems to return, but fails to allocate the shard. If I leave the
"allow_primary" off I get an error as I would expect from reading the above
API notes.

The question is this:

How do I recover from this failure and assign the unassigned shards which
are in pairs of primary and replica?

Any help would be greatly appreciated.

@matthias

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

We just had a very similar problem yesterday with 19.10, in a case where
machines in the cluster became unhealthy and had to be shut down. In a
couple of cases, after we'd reset the nodes, both a primary and secondary
shard needed to be recovered. The system seemed unable to do so
successfully.

We tried manual recovery with allow_primary, as in Matthias' example, and
also found that it had no effect.

The only solution we were able to find was to recreate the index, although
we did it without the tire library; we used aliasing as Jaap suggests to
make it easier to cut over once the new index was ready.

Jeff Miller

On Monday, March 25, 2013 10:32:03 AM UTC-7, Matthias Johnson wrote:

Jaap, thanks for that idea. I'll keep that in mind for the future!

@matthias

On Friday, March 22, 2013 10:56:36 AM UTC-6, Matthias Johnson wrote:

Greetings, I have an 8 node cluster with 4 nodes each in different data
centers. Recently the cluster became partitioned (i.e. couldn't see each
other) due to firewalls.

I have removed the firewall issue and all nodes are once again showing to
be part of the cluster, but there are 8 indexes, which show pairs of
unassigned primary and replica shards (so shard 3 primary and replica of
the same index for example).

{
"cluster_name" : "xxxxxxxx",
"status" : "red",
"timed_out" : false,
"number_of_nodes" : 8,
"number_of_data_nodes" : 8,
"active_primary_shards" : 500,
"active_shards" : 1000,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 16
}

I've looked into the reroute APIhttp://www.elasticsearch.org/guide/reference/api/admin-cluster-reroute.html and
tried the following:

curl XPOST -s 'http://localhost:9200/_cluster/reroute?pretty=true' -d '{
"commands" : [ { "allocate" : {
"index" : "0db5bb70b5d19bfceb5cebd5878fbcb905787745",
"shard" : 3 , "node" : "_9FezNn4TgWjftFeKBAMJw", "allow_primary" : 1 }
}
]
}'

which seems to return, but fails to allocate the shard. If I leave the
"allow_primary" off I get an error as I would expect from reading the above
API notes.

The question is this:

How do I recover from this failure and assign the unassigned shards which
are in pairs of primary and replica?

Any help would be greatly appreciated.

@matthias

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.