ES Ate My Shards/Indexes too


(anghelutar) #1

Hello,

I just had a very similar problem (also with ES 0.18.6) with what is
described here:
http://elasticsearch-users.115913.n3.nabble.com/ES-Ate-My-Shards-Indexes-tt3754558.html#none

I have 590 shards and no less than 224 of them have gone missing. The
index directories appear on disk but there is no data inside :frowning:

All seemed to have been caused by a split-brain situation, the causes
of which I'm still analyzing.

Has there been any further investigation on what may have caused the
deletion of index data?

thanks for any hint,
Roxana


(jagdeep) #2

Whats there in the logs?
It must be saying dangling indexes i guess. It must have happened
because improper shard distribution across different nodes. Please
post configuration details(entries in yml)

Regards
jagdeep

On Jun 11, 5:34 pm, anghelutar anghelu...@gmail.com wrote:

Hello,

I just had a very similar problem (also with ES 0.18.6) with what is
described here:http://elasticsearch-users.115913.n3.nabble.com/ES-Ate-My-Shards-Inde...

I have 590 shards and no less than 224 of them have gone missing. The
index directories appear on disk but there is no data inside :frowning:

All seemed to have been caused by a split-brain situation, the causes
of which I'm still analyzing.

Has there been any further investigation on what may have caused the
deletion of index data?

thanks for any hint,
Roxana


(anghelutar) #3

After struggling the whole day to recover as much as possible I
certainly know more about ES...

I was using a cluster with 7 nodes with 590 shards each configured to
have one replica.
The discovery.zen.minimum_master_nodes was set to 1 on all the nodes
(I set it to 2 now). Also the discovery.zen.ping.timeout: was 3
seconds which is not enough if the master goes into a condition like
described below.

I'm still not sure what happened but while trying to recover I
upgraded to 0.19.4. Things seemed to be better but still there was
something odd: many shards had index directories with no data in them.
I would stop the node (which was a slave), remove the entire node
directory and restart the node.
In ES 0.19 the slave cannot know anything about the shard if I remove
the entire directory. Still it would recreate the removed directory,
presumably because the master told it so. Then the master would spit
hundreds of errors/second like this:
[2012-06-11 17:50:33,255][WARN ][cluster.action.shard ] [inuit]
received shard failed for [ng0010305][1],
node[F_TMayYDRDeU0Kb2yOkaTA], [P], s[INITIALIZING], reason [Failed to
start shard, message [IndexShardGatewayRecoveryException[[ng0010305]
[1] shard allocated for local recovery (post api), should exists, but
doesn't]]]

It seems to me that the master is trying to impose a shard config onto
a slave, because it somehow thinks that the slave should contain that
shard. This certainly makes sense if the master is trying to replicate
a shard. However in this case, there is no copy of a shard and the
master becomes very unresponsive, maybe even only because it generates
too many error messages like the one above.

So it could be that the original error has been generated because the
cluster has been stopped while a replication was going on. Then when
the cluster came up again, it was without the node that contained the
good copy for the shard that was being replicated. So ES tried to
replicate the incomplete shard onto another node so it ended up with
two incomplete copies of the shard. To add to the misery, it could
well be that during this time there were some new documents added to
the index. Then when the original node that had the good copy of the
shard came up, ES asked it to remove the data since there were already
two nodes with good copies of data. Could this be a possible scenario?

In any case I think ES should not try to do anything with a shard that
has no valid copy.

Currently the cluster is in a somehow stable situation after shutting
it down and removing all the shards containing empty index
directories.

roxana

On Jun 11, 5:19 pm, jagdeep reach.jagd...@gmail.com wrote:

Whats there in the logs?
It must be saying dangling indexes i guess. It must have happened
because improper shard distribution across different nodes. Please
post configuration details(entries in yml)

Regards
jagdeep

On Jun 11, 5:34 pm, anghelutar anghelu...@gmail.com wrote:

Hello,

I just had a very similar problem (also with ES 0.18.6) with what is
described here:http://elasticsearch-users.115913.n3.nabble.com/ES-Ate-My-Shards-Inde...

I have 590 shards and no less than 224 of them have gone missing. The
index directories appear on disk but there is no data inside :frowning:

All seemed to have been caused by a split-brain situation, the causes
of which I'm still analyzing.

Has there been any further investigation on what may have caused the
deletion of index data?

thanks for any hint,
Roxana


(anghelutar) #4

Sorry, after reading my mail I realized that I didn't mentioned
something: the 590 shards are from 118 indices. Each index is split
into 5 shards and each shard is supposed to be hosted on 2 of the 7
nodes.

On Jun 11, 9:08 pm, anghelutar anghelu...@gmail.com wrote:

After struggling the whole day to recover as much as possible I
certainly know more about ES...

I was using a cluster with 7 nodes with 590 shards each configured to
have one replica.
The discovery.zen.minimum_master_nodes was set to 1 on all the nodes
(I set it to 2 now). Also the discovery.zen.ping.timeout: was 3
seconds which is not enough if the master goes into a condition like
described below.

I'm still not sure what happened but while trying to recover I
upgraded to 0.19.4. Things seemed to be better but still there was
something odd: many shards had index directories with no data in them.
I would stop the node (which was a slave), remove the entire node
directory and restart the node.
In ES 0.19 the slave cannot know anything about the shard if I remove
the entire directory. Still it would recreate the removed directory,
presumably because the master told it so. Then the master would spit
hundreds of errors/second like this:
[2012-06-11 17:50:33,255][WARN ][cluster.action.shard ] [inuit]
received shard failed for [ng0010305][1],
node[F_TMayYDRDeU0Kb2yOkaTA], [P], s[INITIALIZING], reason [Failed to
start shard, message [IndexShardGatewayRecoveryException[[ng0010305]
[1] shard allocated for local recovery (post api), should exists, but
doesn't]]]

It seems to me that the master is trying to impose a shard config onto
a slave, because it somehow thinks that the slave should contain that
shard. This certainly makes sense if the master is trying to replicate
a shard. However in this case, there is no copy of a shard and the
master becomes very unresponsive, maybe even only because it generates
too many error messages like the one above.

So it could be that the original error has been generated because the
cluster has been stopped while a replication was going on. Then when
the cluster came up again, it was without the node that contained the
good copy for the shard that was being replicated. So ES tried to
replicate the incomplete shard onto another node so it ended up with
two incomplete copies of the shard. To add to the misery, it could
well be that during this time there were some new documents added to
the index. Then when the original node that had the good copy of the
shard came up, ES asked it to remove the data since there were already
two nodes with good copies of data. Could this be a possible scenario?

In any case I think ES should not try to do anything with a shard that
has no valid copy.

Currently the cluster is in a somehow stable situation after shutting
it down and removing all the shards containing empty index
directories.

roxana

On Jun 11, 5:19 pm, jagdeep reach.jagd...@gmail.com wrote:

Whats there in the logs?
It must be saying dangling indexes i guess. It must have happened
because improper shard distribution across different nodes. Please
post configuration details(entries in yml)

Regards
jagdeep

On Jun 11, 5:34 pm, anghelutar anghelu...@gmail.com wrote:

Hello,

I just had a very similar problem (also with ES 0.18.6) with what is
described here:http://elasticsearch-users.115913.n3.nabble.com/ES-Ate-My-Shards-Inde...

I have 590 shards and no less than 224 of them have gone missing. The
index directories appear on disk but there is no data inside :frowning:

All seemed to have been caused by a split-brain situation, the causes
of which I'm still analyzing.

Has there been any further investigation on what may have caused the
deletion of index data?

thanks for any hint,
Roxana


(anghelutar) #5

Looking more through the logs, it seems that some OutOfMemory error
was also involved.
This brings me to the point: isn't it wise to simply quit the java
process in case an OOM exception is encountered?
There is no way to safely recover from it and it can cause really bad
things to happen.

On Jun 11, 9:30 pm, anghelutar anghelu...@gmail.com wrote:

Sorry, after reading my mail I realized that I didn't mentioned
something: the 590 shards are from 118 indices. Each index is split
into 5 shards and each shard is supposed to be hosted on 2 of the 7
nodes.

On Jun 11, 9:08 pm, anghelutar anghelu...@gmail.com wrote:

After struggling the whole day to recover as much as possible I
certainly know more about ES...

I was using a cluster with 7 nodes with 590 shards each configured to
have one replica.
The discovery.zen.minimum_master_nodes was set to 1 on all the nodes
(I set it to 2 now). Also the discovery.zen.ping.timeout: was 3
seconds which is not enough if the master goes into a condition like
described below.

I'm still not sure what happened but while trying to recover I
upgraded to 0.19.4. Things seemed to be better but still there was
something odd: many shards had index directories with no data in them.
I would stop the node (which was a slave), remove the entire node
directory and restart the node.
In ES 0.19 the slave cannot know anything about the shard if I remove
the entire directory. Still it would recreate the removed directory,
presumably because the master told it so. Then the master would spit
hundreds of errors/second like this:
[2012-06-11 17:50:33,255][WARN ][cluster.action.shard ] [inuit]
received shard failed for [ng0010305][1],
node[F_TMayYDRDeU0Kb2yOkaTA], [P], s[INITIALIZING], reason [Failed to
start shard, message [IndexShardGatewayRecoveryException[[ng0010305]
[1] shard allocated for local recovery (post api), should exists, but
doesn't]]]

It seems to me that the master is trying to impose a shard config onto
a slave, because it somehow thinks that the slave should contain that
shard. This certainly makes sense if the master is trying to replicate
a shard. However in this case, there is no copy of a shard and the
master becomes very unresponsive, maybe even only because it generates
too many error messages like the one above.

So it could be that the original error has been generated because the
cluster has been stopped while a replication was going on. Then when
the cluster came up again, it was without the node that contained the
good copy for the shard that was being replicated. So ES tried to
replicate the incomplete shard onto another node so it ended up with
two incomplete copies of the shard. To add to the misery, it could
well be that during this time there were some new documents added to
the index. Then when the original node that had the good copy of the
shard came up, ES asked it to remove the data since there were already
two nodes with good copies of data. Could this be a possible scenario?

In any case I think ES should not try to do anything with a shard that
has no valid copy.

Currently the cluster is in a somehow stable situation after shutting
it down and removing all the shards containing empty index
directories.

roxana

On Jun 11, 5:19 pm, jagdeep reach.jagd...@gmail.com wrote:

Whats there in the logs?
It must be saying dangling indexes i guess. It must have happened
because improper shard distribution across different nodes. Please
post configuration details(entries in yml)

Regards
jagdeep

On Jun 11, 5:34 pm, anghelutar anghelu...@gmail.com wrote:

Hello,

I just had a very similar problem (also with ES 0.18.6) with what is
described here:http://elasticsearch-users.115913.n3.nabble.com/ES-Ate-My-Shards-Inde...

I have 590 shards and no less than 224 of them have gone missing. The
index directories appear on disk but there is no data inside :frowning:

All seemed to have been caused by a split-brain situation, the causes
of which I'm still analyzing.

Has there been any further investigation on what may have caused the
deletion of index data?

thanks for any hint,
Roxana


(Shay Banon) #6

You can potentially recover from OOM in certain cases, but you can
certainly configure the VM to exit in case of OOM

On Mon, Jun 11, 2012 at 10:21 PM, anghelutar anghelutar@gmail.com wrote:

Looking more through the logs, it seems that some OutOfMemory error
was also involved.
This brings me to the point: isn't it wise to simply quit the java
process in case an OOM exception is encountered?
There is no way to safely recover from it and it can cause really bad
things to happen.

On Jun 11, 9:30 pm, anghelutar anghelu...@gmail.com wrote:

Sorry, after reading my mail I realized that I didn't mentioned
something: the 590 shards are from 118 indices. Each index is split
into 5 shards and each shard is supposed to be hosted on 2 of the 7
nodes.

On Jun 11, 9:08 pm, anghelutar anghelu...@gmail.com wrote:

After struggling the whole day to recover as much as possible I
certainly know more about ES...

I was using a cluster with 7 nodes with 590 shards each configured to
have one replica.
The discovery.zen.minimum_master_nodes was set to 1 on all the nodes
(I set it to 2 now). Also the discovery.zen.ping.timeout: was 3
seconds which is not enough if the master goes into a condition like
described below.

I'm still not sure what happened but while trying to recover I
upgraded to 0.19.4. Things seemed to be better but still there was
something odd: many shards had index directories with no data in them.
I would stop the node (which was a slave), remove the entire node
directory and restart the node.
In ES 0.19 the slave cannot know anything about the shard if I remove
the entire directory. Still it would recreate the removed directory,
presumably because the master told it so. Then the master would spit
hundreds of errors/second like this:
[2012-06-11 17:50:33,255][WARN ][cluster.action.shard ] [inuit]
received shard failed for [ng0010305][1],
node[F_TMayYDRDeU0Kb2yOkaTA], [P], s[INITIALIZING], reason [Failed to
start shard, message [IndexShardGatewayRecoveryException[[ng0010305]
[1] shard allocated for local recovery (post api), should exists, but
doesn't]]]

It seems to me that the master is trying to impose a shard config onto
a slave, because it somehow thinks that the slave should contain that
shard. This certainly makes sense if the master is trying to replicate
a shard. However in this case, there is no copy of a shard and the
master becomes very unresponsive, maybe even only because it generates
too many error messages like the one above.

So it could be that the original error has been generated because the
cluster has been stopped while a replication was going on. Then when
the cluster came up again, it was without the node that contained the
good copy for the shard that was being replicated. So ES tried to
replicate the incomplete shard onto another node so it ended up with
two incomplete copies of the shard. To add to the misery, it could
well be that during this time there were some new documents added to
the index. Then when the original node that had the good copy of the
shard came up, ES asked it to remove the data since there were already
two nodes with good copies of data. Could this be a possible scenario?

In any case I think ES should not try to do anything with a shard that
has no valid copy.

Currently the cluster is in a somehow stable situation after shutting
it down and removing all the shards containing empty index
directories.

roxana

On Jun 11, 5:19 pm, jagdeep reach.jagd...@gmail.com wrote:

Whats there in the logs?
It must be saying dangling indexes i guess. It must have happened
because improper shard distribution across different nodes. Please
post configuration details(entries in yml)

Regards
jagdeep

On Jun 11, 5:34 pm, anghelutar anghelu...@gmail.com wrote:

Hello,

I just had a very similar problem (also with ES 0.18.6) with what
is

described here:
http://elasticsearch-users.115913.n3.nabble.com/ES-Ate-My-Shards-Inde...

I have 590 shards and no less than 224 of them have gone missing.
The

index directories appear on disk but there is no data inside :frowning:

All seemed to have been caused by a split-brain situation, the
causes

of which I'm still analyzing.

Has there been any further investigation on what may have caused
the

deletion of index data?

thanks for any hint,
Roxana


(Nicolae Mihalache) #7

How can you recover from OOM?
Even if the OOM is created when processing a query which you may abort,
there is no guarantee that the OOM will happen in the thread processing the
query. It may happen in another thread, like the one that handles the
connection to slaves (in case of a master node) and then the whole cluster
will start behaving funny.

On Wednesday, June 13, 2012 8:29:30 PM UTC+2, kimchy wrote:

You can potentially recover from OOM in certain cases, but you can
certainly configure the VM to exit in case of OOM


(Clinton Gormley) #8

On Wed, 2012-06-13 at 18:22 -0700, Nicolae Mihalache wrote:

How can you recover from OOM?
Even if the OOM is created when processing a query which you may
abort, there is no guarantee that the OOM will happen in the thread
processing the query. It may happen in another thread, like the one
that handles the connection to slaves (in case of a master node) and
then the whole cluster will start behaving funny.

You can try freeing the caches, which will hopefully give you enough
memory for the threads that are OOMing to finish their job

clint

On Wednesday, June 13, 2012 8:29:30 PM UTC+2, kimchy wrote:
You can potentially recover from OOM in certain cases, but you
can certainly configure the VM to exit in case of OOM


(Nicolae Mihalache) #9

On Thursday, June 14, 2012 10:26:38 AM UTC+2, Clinton Gormley wrote:

On Wed, 2012-06-13 at 18:22 -0700, Nicolae Mihalache wrote:

How can you recover from OOM?
Even if the OOM is created when processing a query which you may
abort, there is no guarantee that the OOM will happen in the thread
processing the query. It may happen in another thread, like the one
that handles the connection to slaves (in case of a master node) and
then the whole cluster will start behaving funny.

You can try freeing the caches, which will hopefully give you enough
memory for the threads that are OOMing to finish their job

Ok, but that would meat to catch the OOM exceptions all over the place and
invoke the cache cleaning procedure. This is not done (yet) in ES and
probably not worth doing.
It is easier to detect low memory conditions and start aborting queries.

In any case I still think that quitting the VM in case of OOM is the
easiest way to avoid problems with corrupted/deleted indices.


(system) #10