Elasticsearch red status

Hi,

We have a cluster with 17 nodes, about 30 index and each index 50 shards
using elasticsearch 0.19.9
we have a big problem with getting the ES status yellow. Some times it
never get yellow even after lots of restarting and waiting. I can see in
head plugin that just one primary shard doesn't get allocated.
Is there any solution to solve it? or is it possible to find which node
cause the problem to restart only that node?
Of course I read in this forum that restarting the cluster could solve the
problem(maybe more than 30 times!). But in our case is not possible to
restart the cluster so many times.

Thanks,
Vahid

--

Hello Vahid,

That unallocated shard causes yellow or red status? I mean, is there a
replica allocated for that shard? If yes, that's automatically promoted to
primary. And you should have the quick solution of reducing the number of
replicas by 1, and increasing it back again.

You can find out where the shard belongs to, but you'd have to look in all
the nodes, somewhere like this:

$DATA_DIR/$CLUSTER_NAME/nodes/0/indices/$INDEX_NAME/$SHARD_NUMBER

The node that has the directory but doesn't have the shard allocated to it
contains the troublesome shard.

Best regards,
Radu

http://sematext.com/ -- ElasticSearch -- Solr -- Lucene

On Wed, Jan 9, 2013 at 11:25 AM, Vahid vhasani57@gmail.com wrote:

Hi,

We have a cluster with 17 nodes, about 30 index and each index 50 shards
using elasticsearch 0.19.9
we have a big problem with getting the ES status yellow. Some times it
never get yellow even after lots of restarting and waiting. I can see in
head plugin that just one primary shard doesn't get allocated.
Is there any solution to solve it? or is it possible to find which node
cause the problem to restart only that node?
Of course I read in this forum that restarting the cluster could solve the
problem(maybe more than 30 times!). But in our case is not possible to
restart the cluster so many times.

Thanks,
Vahid

--

--

Hi Radu,
Thank you for your reply,

The cluster status is RED and never get yellow or green.
There is one replica configured, and for one shards both primary and
replica don't get allocated so the status is red and also there is no shard
path(as you mentioned) with such shard number.
In my case, there must be a folder with such path
"$DATA_DIR/$CLUSTER_NAME/nodes/0/indices/$INDEX_NAME/8", but there is no
folder. (8 is the not allocated shards number)

Best regards,
Vahid

On Wednesday, January 9, 2013 10:33:48 AM UTC+1, Radu Gheorghe wrote:

Hello Vahid,

That unallocated shard causes yellow or red status? I mean, is there a
replica allocated for that shard? If yes, that's automatically promoted to
primary. And you should have the quick solution of reducing the number of
replicas by 1, and increasing it back again.

You can find out where the shard belongs to, but you'd have to look in all
the nodes, somewhere like this:

$DATA_DIR/$CLUSTER_NAME/nodes/0/indices/$INDEX_NAME/$SHARD_NUMBER

The node that has the directory but doesn't have the shard allocated to it
contains the troublesome shard.

Best regards,
Radu

http://sematext.com/ -- ElasticSearch -- Solr -- Lucene

On Wed, Jan 9, 2013 at 11:25 AM, Vahid <vhas...@gmail.com <javascript:>>wrote:

Hi,

We have a cluster with 17 nodes, about 30 index and each index 50 shards
using elasticsearch 0.19.9
we have a big problem with getting the ES status yellow. Some times it
never get yellow even after lots of restarting and waiting. I can see in
head plugin that just one primary shard doesn't get allocated.
Is there any solution to solve it? or is it possible to find which node
cause the problem to restart only that node?
Of course I read in this forum that restarting the cluster could solve
the problem(maybe more than 30 times!). But in our case is not possible to
restart the cluster so many times.

Thanks,
Vahid

--

--

Hi Vahid,

OK, so if you don't have the shard contents anywhere you'll probably have
to reindex to get your data back. Or restore from backup.

If you want to know why the shard disappeared (so you can prevent this from
happening again), you have to look for cluesin the time before it
disappeared. Did you do a full cluster restart? Is there something relevant
in the logs?

Best regards,
Radu

http://sematext.com/ -- ElasticSearch -- Solr -- Lucene

On Wed, Jan 9, 2013 at 12:37 PM, Vahid vhasani57@gmail.com wrote:

Hi Radu,
Thank you for your reply,

The cluster status is RED and never get yellow or green.
There is one replica configured, and for one shards both primary and
replica don't get allocated so the status is red and also there is no shard
path(as you mentioned) with such shard number.
In my case, there must be a folder with such path
"$DATA_DIR/$CLUSTER_NAME/nodes/0/indices/$INDEX_NAME/8", but there is no
folder. (8 is the not allocated shards number)

Best regards,
Vahid

On Wednesday, January 9, 2013 10:33:48 AM UTC+1, Radu Gheorghe wrote:

Hello Vahid,

That unallocated shard causes yellow or red status? I mean, is there a
replica allocated for that shard? If yes, that's automatically promoted to
primary. And you should have the quick solution of reducing the number of
replicas by 1, and increasing it back again.

You can find out where the shard belongs to, but you'd have to look in
all the nodes, somewhere like this:

$DATA_DIR/$CLUSTER_NAME/nodes/**0/indices/$INDEX_NAME/$SHARD_**NUMBER

The node that has the directory but doesn't have the shard allocated to
it contains the troublesome shard.

Best regards,
Radu

http://sematext.com/ -- ElasticSearch -- Solr -- Lucene

On Wed, Jan 9, 2013 at 11:25 AM, Vahid vhas...@gmail.com wrote:

Hi,

We have a cluster with 17 nodes, about 30 index and each index 50 shards
using elasticsearch 0.19.9
we have a big problem with getting the ES status yellow. Some times it
never get yellow even after lots of restarting and waiting. I can see in
head plugin that just one primary shard doesn't get allocated.
Is there any solution to solve it? or is it possible to find which node
cause the problem to restart only that node?
Of course I read in this forum that restarting the cluster could solve
the problem(maybe more than 30 times!). But in our case is not possible to
restart the cluster so many times.

Thanks,
Vahid

--

--

--

Hi Radu,

Yes, I've just restarted whole the cluster, the only log I could see is the
some things like this:
"org.elasticsearch.index.IndexShardMissingException: [INDEX_NAME][37]
missing"
that I was observing such exception during the cluster start up before, but
there was no problem. Previously even by such exceptions, after a while ES
was getting yellow and working.
Anyway thank you, now I know that I have some data lost.

Best regards,
Vahid

On Wednesday, January 9, 2013 12:28:56 PM UTC+1, Radu Gheorghe wrote:

Hi Vahid,

OK, so if you don't have the shard contents anywhere you'll probably have
to reindex to get your data back. Or restore from backup.

If you want to know why the shard disappeared (so you can prevent this
from happening again), you have to look for cluesin the time before it
disappeared. Did you do a full cluster restart? Is there something relevant
in the logs?

Best regards,
Radu

http://sematext.com/ -- ElasticSearch -- Solr -- Lucene

On Wed, Jan 9, 2013 at 12:37 PM, Vahid <vhas...@gmail.com <javascript:>>wrote:

Hi Radu,
Thank you for your reply,

The cluster status is RED and never get yellow or green.
There is one replica configured, and for one shards both primary and
replica don't get allocated so the status is red and also there is no shard
path(as you mentioned) with such shard number.
In my case, there must be a folder with such path
"$DATA_DIR/$CLUSTER_NAME/nodes/0/indices/$INDEX_NAME/8", but there is no
folder. (8 is the not allocated shards number)

Best regards,
Vahid

On Wednesday, January 9, 2013 10:33:48 AM UTC+1, Radu Gheorghe wrote:

Hello Vahid,

That unallocated shard causes yellow or red status? I mean, is there a
replica allocated for that shard? If yes, that's automatically promoted to
primary. And you should have the quick solution of reducing the number of
replicas by 1, and increasing it back again.

You can find out where the shard belongs to, but you'd have to look in
all the nodes, somewhere like this:

$DATA_DIR/$CLUSTER_NAME/nodes/**0/indices/$INDEX_NAME/$SHARD_**NUMBER

The node that has the directory but doesn't have the shard allocated to
it contains the troublesome shard.

Best regards,
Radu

http://sematext.com/ -- ElasticSearch -- Solr -- Lucene

On Wed, Jan 9, 2013 at 11:25 AM, Vahid vhas...@gmail.com wrote:

Hi,

We have a cluster with 17 nodes, about 30 index and each index 50
shards using elasticsearch 0.19.9
we have a big problem with getting the ES status yellow. Some times it
never get yellow even after lots of restarting and waiting. I can see in
head plugin that just one primary shard doesn't get allocated.
Is there any solution to solve it? or is it possible to find which node
cause the problem to restart only that node?
Of course I read in this forum that restarting the cluster could solve
the problem(maybe more than 30 times!). But in our case is not possible to
restart the cluster so many times.

Thanks,
Vahid

--

--

--

You're welcome :slight_smile:

Just one quick note: make sure you have appropriate recovery[0] and
minimum_master_nodes[1] settings. Otherwise, restarting your cluster might
lead to dangling indices or split brain.

[0] http://www.elasticsearch.org/guide/reference/modules/gateway/
[1] http://www.elasticsearch.org/guide/reference/modules/discovery/zen.html

Best regards,
Radu

http://sematext.com/ -- ElasticSearch -- Solr -- Lucene

On Wed, Jan 9, 2013 at 3:04 PM, Vahid vhasani57@gmail.com wrote:

Hi Radu,

Yes, I've just restarted whole the cluster, the only log I could see is
the some things like this:
"org.elasticsearch.index.IndexShardMissingException: [INDEX_NAME][37]
missing"
that I was observing such exception during the cluster start up before,
but there was no problem. Previously even by such exceptions, after a while
ES was getting yellow and working.
Anyway thank you, now I know that I have some data lost.

Best regards,
Vahid

On Wednesday, January 9, 2013 12:28:56 PM UTC+1, Radu Gheorghe wrote:

Hi Vahid,

OK, so if you don't have the shard contents anywhere you'll probably have
to reindex to get your data back. Or restore from backup.

If you want to know why the shard disappeared (so you can prevent this
from happening again), you have to look for cluesin the time before it
disappeared. Did you do a full cluster restart? Is there something relevant
in the logs?

Best regards,
Radu

http://sematext.com/ -- ElasticSearch -- Solr -- Lucene

On Wed, Jan 9, 2013 at 12:37 PM, Vahid vhas...@gmail.com wrote:

Hi Radu,
Thank you for your reply,

The cluster status is RED and never get yellow or green.
There is one replica configured, and for one shards both primary and
replica don't get allocated so the status is red and also there is no shard
path(as you mentioned) with such shard number.
In my case, there must be a folder with such path
"$DATA_DIR/$CLUSTER_NAME/nodes/0/indices/$INDEX_NAME/8", but there
is no folder. (8 is the not allocated shards number)

Best regards,
Vahid

On Wednesday, January 9, 2013 10:33:48 AM UTC+1, Radu Gheorghe wrote:

Hello Vahid,

That unallocated shard causes yellow or red status? I mean, is there a
replica allocated for that shard? If yes, that's automatically promoted to
primary. And you should have the quick solution of reducing the number of
replicas by 1, and increasing it back again.

You can find out where the shard belongs to, but you'd have to look in
all the nodes, somewhere like this:

$DATA_DIR/$CLUSTER_NAME/nodes/****0/indices/$INDEX_NAME/$SHARD_N
UMBER

The node that has the directory but doesn't have the shard allocated to
it contains the troublesome shard.

Best regards,
Radu

http://sematext.com/ -- ElasticSearch -- Solr -- Lucene

On Wed, Jan 9, 2013 at 11:25 AM, Vahid vhas...@gmail.com wrote:

Hi,

We have a cluster with 17 nodes, about 30 index and each index 50
shards using elasticsearch 0.19.9
we have a big problem with getting the ES status yellow. Some times it
never get yellow even after lots of restarting and waiting. I can see in
head plugin that just one primary shard doesn't get allocated.
Is there any solution to solve it? or is it possible to find which
node cause the problem to restart only that node?
Of course I read in this forum that restarting the cluster could solve
the problem(maybe more than 30 times!). But in our case is not possible to
restart the cluster so many times.

Thanks,
Vahid

--

--

--

--