Disappearing Shards

Jacob_Perkins · December 20, 2010, 5:45pm

I'm experiencing what I believe may be a bug in the elasticsearch
recovery process. That is, when a node goes down (either from load or
otherwise) and I bring it back up, one or more shards go missing. From
the logs it seems that when the data node is brought back up it is
given an empty shard thereby overwriting the shard it previously had.
I'm using elasticsearch-0.13.0 with replicas == 1 on all indices.

--jacob

kimchy · December 20, 2010, 6:22pm

How many nodes are in the cluster? Can you post the logs somewhere so I can
have a look at whats going on (with notes on the time it happened).

On Mon, Dec 20, 2010 at 7:45 PM, Jacob Perkins jacob.a.perkins@gmail.comwrote:

I'm experiencing what I believe may be a bug in the elasticsearch
recovery process. That is, when a node goes down (either from load or
otherwise) and I bring it back up, one or more shards go missing. From
the logs it seems that when the data node is brought back up it is
given an empty shard thereby overwriting the shard it previously had.
I'm using elasticsearch-0.13.0 with replicas == 1 on all indices.

--jacob

Jacob_Perkins · December 20, 2010, 8:15pm

Happened on Friday last week and I can't seem to find the right set of
log files. When it happens again I'll send you more detailed notes +
logs. Thanks,

--jacob

On Dec 20, 12:22 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

How many nodes are in the cluster? Can you post the logs somewhere so I can
have a look at whats going on (with notes on the time it happened).

On Mon, Dec 20, 2010 at 7:45 PM, Jacob Perkins jacob.a.perk...@gmail.comwrote:

I'm experiencing what I believe may be a bug in the elasticsearch
recovery process. That is, when a node goes down (either from load or
otherwise) and I bring it back up, one or more shards go missing. From
the logs it seems that when the data node is brought back up it is
given an empty shard thereby overwriting the shard it previously had.
I'm using elasticsearch-0.13.0 with replicas == 1 on all indices.

--jacob

kimchy · December 21, 2010, 5:51am

Great, thanks!. The local gateway allocation has been improved in upcoming
0.14 for some cases where this might happen, but they should be very rare,
so wondered if you hit that...

On Mon, Dec 20, 2010 at 10:15 PM, Jacob Perkins
jacob.a.perkins@gmail.comwrote:

Happened on Friday last week and I can't seem to find the right set of
log files. When it happens again I'll send you more detailed notes +
logs. Thanks,

--jacob

On Dec 20, 12:22 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

How many nodes are in the cluster? Can you post the logs somewhere so I
can
have a look at whats going on (with notes on the time it happened).

On Mon, Dec 20, 2010 at 7:45 PM, Jacob Perkins <
jacob.a.perk...@gmail.com>wrote:

I'm experiencing what I believe may be a bug in the elasticsearch
recovery process. That is, when a node goes down (either from load or
otherwise) and I bring it back up, one or more shards go missing. From
the logs it seems that when the data node is brought back up it is
given an empty shard thereby overwriting the shard it previously had.
I'm using elasticsearch-0.13.0 with replicas == 1 on all indices.

--jacob

Jacob_Perkins · December 21, 2010, 6:23pm

On Dec 20, 11:51 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Great, thanks!. The local gateway allocation has been improved in upcoming
0.14 for some cases where this might happen, but they should be very rare,
so wondered if you hit that...

On Mon, Dec 20, 2010 at 10:15 PM, Jacob Perkins
jacob.a.perk...@gmail.comwrote:

Happened on Friday last week and I can't seem to find the right set of
log files. When it happens again I'll send you more detailed notes +
logs. Thanks,

--jacob

On Dec 20, 12:22 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

How many nodes are in the cluster? Can you post the logs somewhere so I
can
have a look at whats going on (with notes on the time it happened).

On Mon, Dec 20, 2010 at 7:45 PM, Jacob Perkins <
jacob.a.perk...@gmail.com>wrote:

I'm experiencing what I believe may be a bug in the elasticsearch
recovery process. That is, when a node goes down (either from load or
otherwise) and I bring it back up, one or more shards go missing. From
the logs it seems that when the data node is brought back up it is
given an empty shard thereby overwriting the shard it previously had.
I'm using elasticsearch-0.13.0 with replicas == 1 on all indices.

--jacob

So, I packaged all the logs and made them available here:

http://infochimps-test.s3.amazonaws.com/elasticsearch?AWSAccessKeyId=02S6Y1EFA7ZZ7KCZH3G2&Expires=1293214331&Signature=razCSXlxHjYUXXic985pMTmSLMQ%3D

I'm not sure what I should be looking for. (After adding more machines
to the cluster the entire thing went down, so ...)

--jacob

kimchy · December 21, 2010, 6:26pm

I can't download the link (getting link broken...). What do you mean by
after adding more nodes the entire thing went down?

On Tue, Dec 21, 2010 at 8:23 PM, Jacob Perkins jacob.a.perkins@gmail.comwrote:

On Dec 20, 11:51 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Great, thanks!. The local gateway allocation has been improved in
upcoming
0.14 for some cases where this might happen, but they should be very
rare,
so wondered if you hit that...

On Mon, Dec 20, 2010 at 10:15 PM, Jacob Perkins
jacob.a.perk...@gmail.comwrote:

Happened on Friday last week and I can't seem to find the right set of
log files. When it happens again I'll send you more detailed notes +
logs. Thanks,

--jacob

On Dec 20, 12:22 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

How many nodes are in the cluster? Can you post the logs somewhere so
I
can
have a look at whats going on (with notes on the time it happened).

On Mon, Dec 20, 2010 at 7:45 PM, Jacob Perkins <
jacob.a.perk...@gmail.com>wrote:

I'm experiencing what I believe may be a bug in the elasticsearch
recovery process. That is, when a node goes down (either from load
or
otherwise) and I bring it back up, one or more shards go missing.
From
the logs it seems that when the data node is brought back up it is
given an empty shard thereby overwriting the shard it previously
had.
I'm using elasticsearch-0.13.0 with replicas == 1 on all indices.

--jacob

So, I packaged all the logs and made them available here:

http://infochimps-test.s3.amazonaws.com/elasticsearch?AWSAccessKeyId=02S6Y1EFA7ZZ7KCZH3G2&Expires=1293214331&Signature=razCSXlxHjYUXXic985pMTmSLMQ%3D

I'm not sure what I should be looking for. (After adding more machines
to the cluster the entire thing went down, so ...)

--jacob

Jacob_Perkins · December 21, 2010, 7:23pm

On Dec 21, 12:26 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

I can't download the link (getting link broken...). What do you mean by
after adding more nodes the entire thing went down?

On Tue, Dec 21, 2010 at 8:23 PM, Jacob Perkins jacob.a.perk...@gmail.comwrote:

On Dec 20, 11:51 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Great, thanks!. The local gateway allocation has been improved in
upcoming
0.14 for some cases where this might happen, but they should be very
rare,
so wondered if you hit that...

On Mon, Dec 20, 2010 at 10:15 PM, Jacob Perkins
jacob.a.perk...@gmail.comwrote:

Happened on Friday last week and I can't seem to find the right set of
log files. When it happens again I'll send you more detailed notes +
logs. Thanks,

--jacob

On Dec 20, 12:22 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

How many nodes are in the cluster? Can you post the logs somewhere so
I
can
have a look at whats going on (with notes on the time it happened).

On Mon, Dec 20, 2010 at 7:45 PM, Jacob Perkins <
jacob.a.perk...@gmail.com>wrote:

I'm experiencing what I believe may be a bug in the elasticsearch
recovery process. That is, when a node goes down (either from load
or
otherwise) and I bring it back up, one or more shards go missing.
From
the logs it seems that when the data node is brought back up it is
given an empty shard thereby overwriting the shard it previously
had.
I'm using elasticsearch-0.13.0 with replicas == 1 on all indices.

--jacob

So, I packaged all the logs and made them available here:

http://infochimps-test.s3.amazonaws.com/elasticsearch?AWSAccessKeyId=...

I'm not sure what I should be looking for. (After adding more machines
to the cluster the entire thing went down, so ...)

--jacob

Evidently I can't make one for the top level directory, here's a list
(one from each node that was up at the time)

After adding more machines a (repair?) started and barraged the new
machines with data. Then the cluster went yellow and shortly after
everything stopped responding.

--jacob

Jacob_Perkins · December 21, 2010, 7:28pm

On Dec 21, 12:26 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

I can't download the link (getting link broken...). What do you mean by
after adding more nodes the entire thing went down?

On Tue, Dec 21, 2010 at 8:23 PM, Jacob Perkins jacob.a.perk...@gmail.comwrote:

On Dec 20, 11:51 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Great, thanks!. The local gateway allocation has been improved in
upcoming
0.14 for some cases where this might happen, but they should be very
rare,
so wondered if you hit that...

On Mon, Dec 20, 2010 at 10:15 PM, Jacob Perkins
jacob.a.perk...@gmail.comwrote:

Happened on Friday last week and I can't seem to find the right set of
log files. When it happens again I'll send you more detailed notes +
logs. Thanks,

--jacob

On Dec 20, 12:22 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

How many nodes are in the cluster? Can you post the logs somewhere so
I
can
have a look at whats going on (with notes on the time it happened).

On Mon, Dec 20, 2010 at 7:45 PM, Jacob Perkins <
jacob.a.perk...@gmail.com>wrote:

I'm experiencing what I believe may be a bug in the elasticsearch
recovery process. That is, when a node goes down (either from load
or
otherwise) and I bring it back up, one or more shards go missing.
From
the logs it seems that when the data node is brought back up it is
given an empty shard thereby overwriting the shard it previously
had.
I'm using elasticsearch-0.13.0 with replicas == 1 on all indices.

--jacob

So, I packaged all the logs and made them available here:

http://infochimps-test.s3.amazonaws.com/elasticsearch?AWSAccessKeyId=...

I'm not sure what I should be looking for. (After adding more machines
to the cluster the entire thing went down, so ...)

--jacob

Evidently I can't make one for the top level directory, here's a list
(one from each node that was up at the time)

After adding more machines a (repair?) started and barraged the new
machines with data. Then the cluster went yellow and shortly after
everything stopped responding.

--jacob

Jacob_Perkins · December 21, 2010, 7:39pm

On Dec 21, 12:26 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

I can't download the link (getting link broken...). What do you mean by
after adding more nodes the entire thing went down?

On Tue, Dec 21, 2010 at 8:23 PM, Jacob Perkins jacob.a.perk...@gmail.comwrote:

On Dec 20, 11:51 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Great, thanks!. The local gateway allocation has been improved in
upcoming
0.14 for some cases where this might happen, but they should be very
rare,
so wondered if you hit that...

On Mon, Dec 20, 2010 at 10:15 PM, Jacob Perkins
jacob.a.perk...@gmail.comwrote:

Happened on Friday last week and I can't seem to find the right set of
log files. When it happens again I'll send you more detailed notes +
logs. Thanks,

--jacob

On Dec 20, 12:22 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

How many nodes are in the cluster? Can you post the logs somewhere so
I
can
have a look at whats going on (with notes on the time it happened).

On Mon, Dec 20, 2010 at 7:45 PM, Jacob Perkins <
jacob.a.perk...@gmail.com>wrote:

I'm experiencing what I believe may be a bug in the elasticsearch
recovery process. That is, when a node goes down (either from load
or
otherwise) and I bring it back up, one or more shards go missing.
From
the logs it seems that when the data node is brought back up it is
given an empty shard thereby overwriting the shard it previously
had.
I'm using elasticsearch-0.13.0 with replicas == 1 on all indices.

--jacob

So, I packaged all the logs and made them available here:

http://infochimps-test.s3.amazonaws.com/elasticsearch?AWSAccessKeyId=...

I'm not sure what I should be looking for. (After adding more machines
to the cluster the entire thing went down, so ...)

--jacob

Evidently I can't make one for the top level directory, here's a list
(one from each node that was up at the time)

After adding more machines a (repair?) started and barraged the new
machines with data. Then the cluster went yellow and shortly after
everything stopped responding.

--jacob

Jacob_Perkins · December 21, 2010, 7:40pm

On Dec 21, 1:28 pm, Jacob Perkins jacob.a.perk...@gmail.com wrote:

On Dec 21, 12:26 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

I can't download the link (getting link broken...). What do you mean by
after adding more nodes the entire thing went down?

On Tue, Dec 21, 2010 at 8:23 PM, Jacob Perkins jacob.a.perk...@gmail.comwrote:

On Dec 20, 11:51 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Great, thanks!. The local gateway allocation has been improved in
upcoming
0.14 for some cases where this might happen, but they should be very
rare,
so wondered if you hit that...

On Mon, Dec 20, 2010 at 10:15 PM, Jacob Perkins
jacob.a.perk...@gmail.comwrote:

Happened on Friday last week and I can't seem to find the right set of
log files. When it happens again I'll send you more detailed notes +
logs. Thanks,

--jacob

On Dec 20, 12:22 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

How many nodes are in the cluster? Can you post the logs somewhere so
I
can
have a look at whats going on (with notes on the time it happened).

On Mon, Dec 20, 2010 at 7:45 PM, Jacob Perkins <
jacob.a.perk...@gmail.com>wrote:

I'm experiencing what I believe may be a bug in the elasticsearch
recovery process. That is, when a node goes down (either from load
or
otherwise) and I bring it back up, one or more shards go missing.
From
the logs it seems that when the data node is brought back up it is
given an empty shard thereby overwriting the shard it previously
had.
I'm using elasticsearch-0.13.0 with replicas == 1 on all indices.

--jacob

So, I packaged all the logs and made them available here:

http://infochimps-test.s3.amazonaws.com/elasticsearch?AWSAccessKeyId=...

I'm not sure what I should be looking for. (After adding more machines
to the cluster the entire thing went down, so ...)

--jacob

Evidently I can't make one for the top level directory, here's a list
(one from each node that was up at the time)

http://infochimps-test.s3.amazonaws.com/elasticsearch/elasticsearch_l...http://infochimps-test.s3.amazonaws.com/elasticsearch/elasticsearch_l...http://infochimps-test.s3.amazonaws.com/elasticsearch/elasticsearch_l...http://infochimps-test.s3.amazonaws.com/elasticsearch/elasticsearch_l...http://infochimps-test.s3.amazonaws.com/elasticsearch/elasticsearch_l...http://infochimps-test.s3.amazonaws.com/elasticsearch/elasticsearch_l...http://infochimps-test.s3.amazonaws.com/elasticsearch/elasticsearch_l...http://infochimps-test.s3.amazonaws.com/elasticsearch/elasticsearch_l...http://infochimps-test.s3.amazonaws.com/elasticsearch/elasticsearch_l...http://infochimps-test.s3.amazonaws.com/elasticsearch/elasticsearch_l...http://infochimps-test.s3.amazonaws.com/elasticsearch/elasticsearch_l...http://infochimps-test.s3.amazonaws.com/elasticsearch/elasticsearch_l...http://infochimps-test.s3.amazonaws.com/elasticsearch/elasticsearch_l...http://infochimps-test.s3.amazonaws.com/elasticsearch/elasticsearch_l...

After adding more machines a (repair?) started and barraged the new
machines with data. Then the cluster went yellow and shortly after
everything stopped responding.

--jacob

New nodes launched at 2010-12-20 12:25 CST

--jacob

Topic		Replies	Views
Missing shards on restart of elastic search process Elasticsearch	3	446	July 5, 2017
Loss of Elasticsearch Replicas/Shards After Node Failures Elasticsearch	4	252	December 6, 2023
Lost the data Elasticsearch	7	399	July 6, 2017
[Elasticsearch 5.5] when node left and rejoin, all data in the node gone Elasticsearch	5	224	May 5, 2022
Data loss after servers hosting the Primary shard and Replica shard were rebooted at the same time Elasticsearch	1	345	July 6, 2017

Disappearing Shards

Related topics