Big issue

Hello, after some network problems, and what seemed like split brian ,
we ended up with this state:

{
"cluster_name" : "elasticsearch0",
"status" : "red",
"timed_out" : false,
"number_of_nodes" : 11,
"number_of_data_nodes" : 11,
"active_primary_shards" : 212,
"active_shards" : 212,
"relocating_shards" : 1,
"initializing_shards" : 0,
"unassigned_shards" : 38
}

Note that we have 38 unassigned_shares, and 0 initializing shards. We
have all the data (from backup), but it looks like out of 11 servers
running we are missing shards being assigned in their totality to the
nodes. Again, I have all my index/ data saved, even those 38 shards
are setting under /index in various nodes. However, without any
errors in the error log, we simple can't exist the RED state. There
is a way to repair metadata somehow?

Thanks,
-Jack

Are you saying you recovered the data from back and those shards don't
initialize? Make sure to recover the _state directory under the data
location (nodes/0), especially if you are using 0.18.

On Fri, Apr 20, 2012 at 6:58 AM, Jack Levin magnito@gmail.com wrote:

Hello, after some network problems, and what seemed like split brian ,
we ended up with this state:

{
"cluster_name" : "elasticsearch0",
"status" : "red",
"timed_out" : false,
"number_of_nodes" : 11,
"number_of_data_nodes" : 11,
"active_primary_shards" : 212,
"active_shards" : 212,
"relocating_shards" : 1,
"initializing_shards" : 0,
"unassigned_shards" : 38
}

Note that we have 38 unassigned_shares, and 0 initializing shards. We
have all the data (from backup), but it looks like out of 11 servers
running we are missing shards being assigned in their totality to the
nodes. Again, I have all my index/ data saved, even those 38 shards
are setting under /index in various nodes. However, without any
errors in the error log, we simple can't exist the RED state. There
is a way to repair metadata somehow?

Thanks,
-Jack

Shay, my worry is that _state is corrupted, and is not indicating that
a node has all the shards I have data for. Is there a way to edit
_state? I can move up a version (from .18) if needed to get it fixed.
(longer story -- during the outage, backup script overwritten _state
possible with wrong values as it ran right when an outage happened).

-Jack

On Apr 20, 1:38 am, Shay Banon kim...@gmail.com wrote:

Are you saying you recovered the data from back and those shards don't
initialize? Make sure to recover the _state directory under the data
location (nodes/0), especially if you are using 0.18.

On Fri, Apr 20, 2012 at 6:58 AM, Jack Levin magn...@gmail.com wrote:

Hello, after some network problems, and what seemed like split brian ,
we ended up with this state:

{
"cluster_name" : "elasticsearch0",
"status" : "red",
"timed_out" : false,
"number_of_nodes" : 11,
"number_of_data_nodes" : 11,
"active_primary_shards" : 212,
"active_shards" : 212,
"relocating_shards" : 1,
"initializing_shards" : 0,
"unassigned_shards" : 38
}

Note that we have 38 unassigned_shares, and 0 initializing shards. We
have all the data (from backup), but it looks like out of 11 servers
running we are missing shards being assigned in their totality to the
nodes. Again, I have all my index/ data saved, even those 38 shards
are setting under /index in various nodes. However, without any
errors in the error log, we simple can't exist the RED state. There
is a way to repair metadata somehow?

Thanks,
-Jack

if we upgrade to 0.19.x perhaps it will solve the issue (https://
Local Gateway: Move shard state to be stored under each shard, and not globally under _state · Issue #1618 · elastic/elasticsearch · GitHub).?

-Jack

On Apr 20, 11:35 am, Jack Levin magn...@gmail.com wrote:

Shay, my worry is that _state is corrupted, and is not indicating that
a node has all the shards I have data for. Is there a way to edit
_state? I can move up a version (from .18) if needed to get it fixed.
(longer story -- during the outage, backup script overwritten _state
possible with wrong values as it ran right when an outage happened).

-Jack

On Apr 20, 1:38 am, Shay Banon kim...@gmail.com wrote:

Are you saying you recovered the data from back and those shards don't
initialize? Make sure to recover the _state directory under the data
location (nodes/0), especially if you are using 0.18.

On Fri, Apr 20, 2012 at 6:58 AM, Jack Levin magn...@gmail.com wrote:

Hello, after some network problems, and what seemed like split brian ,
we ended up with this state:

{
"cluster_name" : "elasticsearch0",
"status" : "red",
"timed_out" : false,
"number_of_nodes" : 11,
"number_of_data_nodes" : 11,
"active_primary_shards" : 212,
"active_shards" : 212,
"relocating_shards" : 1,
"initializing_shards" : 0,
"unassigned_shards" : 38
}

Note that we have 38 unassigned_shares, and 0 initializing shards. We
have all the data (from backup), but it looks like out of 11 servers
running we are missing shards being assigned in their totality to the
nodes. Again, I have all my index/ data saved, even those 38 shards
are setting under /index in various nodes. However, without any
errors in the error log, we simple can't exist the RED state. There
is a way to repair metadata somehow?

Thanks,
-Jack

Hi,

Yes, 0.19 local storage structure is considerably better compared to
0.18. The upgrade though will use the same problematic _state doc that you
have. What you can potentially do is upgrade to 0.19, and then, copy over
the missing shards back some node, copy over the specific _state dir from a
working shard to the ones that are missing, edit the json file in the state
to match the missing shard, and then see if things work... (in 0.19, each
shard has its own allocation state).

On Fri, Apr 20, 2012 at 10:43 PM, Jack Levin magnito@gmail.com wrote:

if we upgrade to 0.19.x perhaps it will solve the issue (https://
Local Gateway: Move shard state to be stored under each shard, and not globally under _state · Issue #1618 · elastic/elasticsearch · GitHub).?

-Jack

On Apr 20, 11:35 am, Jack Levin magn...@gmail.com wrote:

Shay, my worry is that _state is corrupted, and is not indicating that
a node has all the shards I have data for. Is there a way to edit
_state? I can move up a version (from .18) if needed to get it fixed.
(longer story -- during the outage, backup script overwritten _state
possible with wrong values as it ran right when an outage happened).

-Jack

On Apr 20, 1:38 am, Shay Banon kim...@gmail.com wrote:

Are you saying you recovered the data from back and those shards don't
initialize? Make sure to recover the _state directory under the data
location (nodes/0), especially if you are using 0.18.

On Fri, Apr 20, 2012 at 6:58 AM, Jack Levin magn...@gmail.com wrote:

Hello, after some network problems, and what seemed like split brian
,
we ended up with this state:

{
"cluster_name" : "elasticsearch0",
"status" : "red",
"timed_out" : false,
"number_of_nodes" : 11,
"number_of_data_nodes" : 11,
"active_primary_shards" : 212,
"active_shards" : 212,
"relocating_shards" : 1,
"initializing_shards" : 0,
"unassigned_shards" : 38
}

Note that we have 38 unassigned_shares, and 0 initializing shards. We
have all the data (from backup), but it looks like out of 11 servers
running we are missing shards being assigned in their totality to the
nodes. Again, I have all my index/ data saved, even those 38 shards
are setting under /index in various nodes. However, without any
errors in the error log, we simple can't exist the RED state. There
is a way to repair metadata somehow?

Thanks,
-Jack