Big issue

jacque74 · April 20, 2012, 3:58am

Hello, after some network problems, and what seemed like split brian ,
we ended up with this state:

{
"cluster_name" : "elasticsearch0",
"status" : "red",
"timed_out" : false,
"number_of_nodes" : 11,
"number_of_data_nodes" : 11,
"active_primary_shards" : 212,
"active_shards" : 212,
"relocating_shards" : 1,
"initializing_shards" : 0,
"unassigned_shards" : 38
}

Note that we have 38 unassigned_shares, and 0 initializing shards. We
have all the data (from backup), but it looks like out of 11 servers
running we are missing shards being assigned in their totality to the
nodes. Again, I have all my index/ data saved, even those 38 shards
are setting under /index in various nodes. However, without any
errors in the error log, we simple can't exist the RED state. There
is a way to repair metadata somehow?

Thanks,
-Jack

kimchy · April 20, 2012, 8:38am

Are you saying you recovered the data from back and those shards don't
initialize? Make sure to recover the _state directory under the data
location (nodes/0), especially if you are using 0.18.

On Fri, Apr 20, 2012 at 6:58 AM, Jack Levin magnito@gmail.com wrote:

Hello, after some network problems, and what seemed like split brian ,
we ended up with this state:

{
"cluster_name" : "elasticsearch0",
"status" : "red",
"timed_out" : false,
"number_of_nodes" : 11,
"number_of_data_nodes" : 11,
"active_primary_shards" : 212,
"active_shards" : 212,
"relocating_shards" : 1,
"initializing_shards" : 0,
"unassigned_shards" : 38
}

Note that we have 38 unassigned_shares, and 0 initializing shards. We
have all the data (from backup), but it looks like out of 11 servers
running we are missing shards being assigned in their totality to the
nodes. Again, I have all my index/ data saved, even those 38 shards
are setting under /index in various nodes. However, without any
errors in the error log, we simple can't exist the RED state. There
is a way to repair metadata somehow?

Thanks,
-Jack

jacque74 · April 20, 2012, 6:35pm

Shay, my worry is that _state is corrupted, and is not indicating that
a node has all the shards I have data for. Is there a way to edit
_state? I can move up a version (from .18) if needed to get it fixed.
(longer story -- during the outage, backup script overwritten _state
possible with wrong values as it ran right when an outage happened).

-Jack

On Apr 20, 1:38 am, Shay Banon kim...@gmail.com wrote:

Are you saying you recovered the data from back and those shards don't
initialize? Make sure to recover the _state directory under the data
location (nodes/0), especially if you are using 0.18.

On Fri, Apr 20, 2012 at 6:58 AM, Jack Levin magn...@gmail.com wrote:

Hello, after some network problems, and what seemed like split brian ,
we ended up with this state:

{
"cluster_name" : "elasticsearch0",
"status" : "red",
"timed_out" : false,
"number_of_nodes" : 11,
"number_of_data_nodes" : 11,
"active_primary_shards" : 212,
"active_shards" : 212,
"relocating_shards" : 1,
"initializing_shards" : 0,
"unassigned_shards" : 38
}

Note that we have 38 unassigned_shares, and 0 initializing shards. We
have all the data (from backup), but it looks like out of 11 servers
running we are missing shards being assigned in their totality to the
nodes. Again, I have all my index/ data saved, even those 38 shards
are setting under /index in various nodes. However, without any
errors in the error log, we simple can't exist the RED state. There
is a way to repair metadata somehow?

Thanks,
-Jack

jacque74 · April 20, 2012, 7:43pm

if we upgrade to 0.19.x perhaps it will solve the issue (https://
Local Gateway: Move shard state to be stored under each shard, and not globally under _state · Issue #1618 · elastic/elasticsearch · GitHub).?

-Jack

On Apr 20, 11:35 am, Jack Levin magn...@gmail.com wrote:

Shay, my worry is that _state is corrupted, and is not indicating that
a node has all the shards I have data for. Is there a way to edit
_state? I can move up a version (from .18) if needed to get it fixed.
(longer story -- during the outage, backup script overwritten _state
possible with wrong values as it ran right when an outage happened).

-Jack

On Apr 20, 1:38 am, Shay Banon kim...@gmail.com wrote:

Are you saying you recovered the data from back and those shards don't
initialize? Make sure to recover the _state directory under the data
location (nodes/0), especially if you are using 0.18.

On Fri, Apr 20, 2012 at 6:58 AM, Jack Levin magn...@gmail.com wrote:

Hello, after some network problems, and what seemed like split brian ,
we ended up with this state:

{
"cluster_name" : "elasticsearch0",
"status" : "red",
"timed_out" : false,
"number_of_nodes" : 11,
"number_of_data_nodes" : 11,
"active_primary_shards" : 212,
"active_shards" : 212,
"relocating_shards" : 1,
"initializing_shards" : 0,
"unassigned_shards" : 38
}

Note that we have 38 unassigned_shares, and 0 initializing shards. We
have all the data (from backup), but it looks like out of 11 servers
running we are missing shards being assigned in their totality to the
nodes. Again, I have all my index/ data saved, even those 38 shards
are setting under /index in various nodes. However, without any
errors in the error log, we simple can't exist the RED state. There
is a way to repair metadata somehow?

Thanks,
-Jack

kimchy · April 21, 2012, 3:13pm

Hi,

Yes, 0.19 local storage structure is considerably better compared to
0.18. The upgrade though will use the same problematic _state doc that you
have. What you can potentially do is upgrade to 0.19, and then, copy over
the missing shards back some node, copy over the specific _state dir from a
working shard to the ones that are missing, edit the json file in the state
to match the missing shard, and then see if things work... (in 0.19, each
shard has its own allocation state).

On Fri, Apr 20, 2012 at 10:43 PM, Jack Levin magnito@gmail.com wrote:

if we upgrade to 0.19.x perhaps it will solve the issue (https://
Local Gateway: Move shard state to be stored under each shard, and not globally under _state · Issue #1618 · elastic/elasticsearch · GitHub).?

-Jack

On Apr 20, 11:35 am, Jack Levin magn...@gmail.com wrote:

Shay, my worry is that _state is corrupted, and is not indicating that
a node has all the shards I have data for. Is there a way to edit
_state? I can move up a version (from .18) if needed to get it fixed.
(longer story -- during the outage, backup script overwritten _state
possible with wrong values as it ran right when an outage happened).

-Jack

On Apr 20, 1:38 am, Shay Banon kim...@gmail.com wrote:

Are you saying you recovered the data from back and those shards don't
initialize? Make sure to recover the _state directory under the data
location (nodes/0), especially if you are using 0.18.

On Fri, Apr 20, 2012 at 6:58 AM, Jack Levin magn...@gmail.com wrote:

Hello, after some network problems, and what seemed like split brian
,
we ended up with this state:

{
"cluster_name" : "elasticsearch0",
"status" : "red",
"timed_out" : false,
"number_of_nodes" : 11,
"number_of_data_nodes" : 11,
"active_primary_shards" : 212,
"active_shards" : 212,
"relocating_shards" : 1,
"initializing_shards" : 0,
"unassigned_shards" : 38
}

Note that we have 38 unassigned_shares, and 0 initializing shards. We
have all the data (from backup), but it looks like out of 11 servers
running we are missing shards being assigned in their totality to the
nodes. Again, I have all my index/ data saved, even those 38 shards
are setting under /index in various nodes. However, without any
errors in the error log, we simple can't exist the RED state. There
is a way to repair metadata somehow?

Thanks,
-Jack

Topic		Replies	Views
ES Ate My Shards/Indexes Elasticsearch	13	532	July 6, 2017
Red shard status - why please? Elasticsearch	5	973	July 6, 2017
Shard stuck in INITIALIZING state Elasticsearch	2	14281	June 17, 2017
Lost shards and cluster state stays red Elasticsearch	3	5233	July 6, 2017
Elasticsearch cluster is in Red state. How to recover it? Elasticsearch	14	10751	December 25, 2019

Big issue

Related topics