Unassigned shards


(Mohamed Lrhazi) #1

Hello,

It seems we over whelmed our 6 node cluster with indexes... we started
getting too many open files errors...
We added 6 more nodes and also upgraded to the latest version
(elasticsearch-0.90.7-1.noarch)

Now the cluster starts and loads most shards... except for 58, for which
the only log I seem to find is lines like these:

[2013-11-18 13:19:27,277][DEBUG][gateway.local] [rap-es] [el_2003][11]: not
allocating, number_of_allocated_shards_found [0], required_number [1]
[2013-11-18 13:19:27,277][DEBUG][gateway.local] [rap-es] [de_2005][10]: not
allocating, number_of_allocated_shards_found [0], required_number [1]
[2013-11-18 13:19:27,277][DEBUG][gateway.local] [rap-es] [de_2005][4]: not
allocating, number_of_allocated_shards_found [0], required_number [1]

What would cause this problem? and how do we recover from it?

cluster health:

{
"cluster_name" : "foo",
"status" : "red",
"timed_out" : false,
"number_of_nodes" : 13,
"number_of_data_nodes" : 12,
"active_primary_shards" : 4154,
"active_shards" : 8308,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 56
}

Thanks a lot,
Mohamed.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Mohamed Lrhazi) #2

After I pasted the above I notice the number of unassigned shards actually
went down to 56, from 58.... maybe it'll keep going down in the next many
hours! I'll update this thread in a few hours.

Thanks,
Mohamed.

On Mon, Nov 18, 2013 at 1:38 PM, Mohamed Lrhazi ml623@georgetown.eduwrote:

Hello,

It seems we over whelmed our 6 node cluster with indexes... we started
getting too many open files errors...
We added 6 more nodes and also upgraded to the latest version
(elasticsearch-0.90.7-1.noarch)

Now the cluster starts and loads most shards... except for 58, for which
the only log I seem to find is lines like these:

[2013-11-18 13:19:27,277][DEBUG][gateway.local] [rap-es] [el_2003][11]:
not allocating, number_of_allocated_shards_found [0], required_number [1]
[2013-11-18 13:19:27,277][DEBUG][gateway.local] [rap-es] [de_2005][10]:
not allocating, number_of_allocated_shards_found [0], required_number [1]
[2013-11-18 13:19:27,277][DEBUG][gateway.local] [rap-es] [de_2005][4]: not
allocating, number_of_allocated_shards_found [0], required_number [1]

What would cause this problem? and how do we recover from it?

cluster health:

{
"cluster_name" : "foo",
"status" : "red",
"timed_out" : false,
"number_of_nodes" : 13,
"number_of_data_nodes" : 12,
"active_primary_shards" : 4154,
"active_shards" : 8308,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 56
}

Thanks a lot,
Mohamed.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/0-aw3_GLVZ0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Boaz Leskes) #3

HI Mohamed,

If you get too many open files error you should probably do the
following: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/setup-configuration.html#file-descriptors

Also, the number of shards per node is very high. Unless you know for sure
you are not searching all (or many) of them, you can expect CPU issues as
they are searched in parallel.

Cheers,
Boaz

On Monday, November 18, 2013 7:48:25 PM UTC+1, Mohamed Lrhazi wrote:

After I pasted the above I notice the number of unassigned shards actually
went down to 56, from 58.... maybe it'll keep going down in the next many
hours! I'll update this thread in a few hours.

Thanks,
Mohamed.

On Mon, Nov 18, 2013 at 1:38 PM, Mohamed Lrhazi <ml...@georgetown.edu<javascript:>

wrote:

Hello,

It seems we over whelmed our 6 node cluster with indexes... we started
getting too many open files errors...
We added 6 more nodes and also upgraded to the latest version
(elasticsearch-0.90.7-1.noarch)

Now the cluster starts and loads most shards... except for 58, for which
the only log I seem to find is lines like these:

[2013-11-18 13:19:27,277][DEBUG][gateway.local] [rap-es] [el_2003][11]:
not allocating, number_of_allocated_shards_found [0], required_number [1]
[2013-11-18 13:19:27,277][DEBUG][gateway.local] [rap-es] [de_2005][10]:
not allocating, number_of_allocated_shards_found [0], required_number [1]
[2013-11-18 13:19:27,277][DEBUG][gateway.local] [rap-es] [de_2005][4]:
not allocating, number_of_allocated_shards_found [0], required_number [1]

What would cause this problem? and how do we recover from it?

cluster health:

{
"cluster_name" : "foo",
"status" : "red",
"timed_out" : false,
"number_of_nodes" : 13,
"number_of_data_nodes" : 12,
"active_primary_shards" : 4154,
"active_shards" : 8308,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 56
}

Thanks a lot,
Mohamed.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/0-aw3_GLVZ0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Mohamed Lrhazi) #4

Thank Boaz. We had increased the allowed max open files and for the moment
we do not see that error... We still have unassigned shards though. It
seems a colleague of mine was working on the issue too and been deleting
indexes that we empty and had unassigned shards....

Could it be that that is the issue, somehow empty, or almost empty (say two
documents, while index has 12 shards), indexes fail to get assigned? would
that be a known issue?

Thanks a lot,
Mohamed.

On Monday, November 18, 2013 5:35:48 PM UTC-5, Boaz Leskes wrote:

HI Mohamed,

If you get too many open files error you should probably do the following:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/setup-configuration.html#file-descriptors

Also, the number of shards per node is very high. Unless you know for sure
you are not searching all (or many) of them, you can expect CPU issues as
they are searched in parallel.

Cheers,
Boaz

On Monday, November 18, 2013 7:48:25 PM UTC+1, Mohamed Lrhazi wrote:

After I pasted the above I notice the number of unassigned shards
actually went down to 56, from 58.... maybe it'll keep going down in the
next many hours! I'll update this thread in a few hours.

Thanks,
Mohamed.

On Mon, Nov 18, 2013 at 1:38 PM, Mohamed Lrhazi ml...@georgetown.eduwrote:

Hello,

It seems we over whelmed our 6 node cluster with indexes... we started
getting too many open files errors...
We added 6 more nodes and also upgraded to the latest version
(elasticsearch-0.90.7-1.noarch)

Now the cluster starts and loads most shards... except for 58, for which
the only log I seem to find is lines like these:

[2013-11-18 13:19:27,277][DEBUG][gateway.local] [rap-es] [el_2003][11]:
not allocating, number_of_allocated_shards_found [0], required_number [1]
[2013-11-18 13:19:27,277][DEBUG][gateway.local] [rap-es] [de_2005][10]:
not allocating, number_of_allocated_shards_found [0], required_number [1]
[2013-11-18 13:19:27,277][DEBUG][gateway.local] [rap-es] [de_2005][4]:
not allocating, number_of_allocated_shards_found [0], required_number [1]

What would cause this problem? and how do we recover from it?

cluster health:

{
"cluster_name" : "foo",
"status" : "red",
"timed_out" : false,
"number_of_nodes" : 13,
"number_of_data_nodes" : 12,
"active_primary_shards" : 4154,
"active_shards" : 8308,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 56
}

Thanks a lot,
Mohamed.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/0-aw3_GLVZ0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Boaz Leskes) #5

Hi Mohamed,

The number of documents in a shard should not effect whether it's assigned
or not.

When a shard fails too often on a node, ES does not try to allocate that
shard on that node anymore. I think what happened is that you have shard
allocation failed too often due to the max open files issue and got all the
nodes black listed for those shards. Can you try restarting the cluster
with th new max open files settings and see if that helps?

Cheers,
Boaz

On Tue, Nov 19, 2013 at 3:13 PM, Mohamed Lrhazi ml623@georgetown.eduwrote:

Thank Boaz. We had increased the allowed max open files and for the moment
we do not see that error... We still have unassigned shards though. It
seems a colleague of mine was working on the issue too and been deleting
indexes that we empty and had unassigned shards....

Could it be that that is the issue, somehow empty, or almost empty (say
two documents, while index has 12 shards), indexes fail to get assigned?
would that be a known issue?

Thanks a lot,
Mohamed.

On Monday, November 18, 2013 5:35:48 PM UTC-5, Boaz Leskes wrote:

HI Mohamed,

If you get too many open files error you should probably do the
following: http://www.elasticsearch.org/guide/en/elasticsearch/reference/
current/setup-configuration.html#file-descriptors

Also, the number of shards per node is very high. Unless you know for
sure you are not searching all (or many) of them, you can expect CPU issues
as they are searched in parallel.

Cheers,
Boaz

On Monday, November 18, 2013 7:48:25 PM UTC+1, Mohamed Lrhazi wrote:

After I pasted the above I notice the number of unassigned shards
actually went down to 56, from 58.... maybe it'll keep going down in the
next many hours! I'll update this thread in a few hours.

Thanks,
Mohamed.

On Mon, Nov 18, 2013 at 1:38 PM, Mohamed Lrhazi ml...@georgetown.eduwrote:

Hello,

It seems we over whelmed our 6 node cluster with indexes... we started
getting too many open files errors...
We added 6 more nodes and also upgraded to the latest version
(elasticsearch-0.90.7-1.noarch)

Now the cluster starts and loads most shards... except for 58, for
which the only log I seem to find is lines like these:

[2013-11-18 13:19:27,277][DEBUG][gateway.local] [rap-es]
[el_2003][11]: not allocating, number_of_allocated_shards_found [0],
required_number [1]
[2013-11-18 13:19:27,277][DEBUG][gateway.local] [rap-es]
[de_2005][10]: not allocating, number_of_allocated_shards_found [0],
required_number [1]
[2013-11-18 13:19:27,277][DEBUG][gateway.local] [rap-es] [de_2005][4]:
not allocating, number_of_allocated_shards_found [0], required_number
[1]

What would cause this problem? and how do we recover from it?

cluster health:

{
"cluster_name" : "foo",
"status" : "red",
"timed_out" : false,
"number_of_nodes" : 13,
"number_of_data_nodes" : 12,
"active_primary_shards" : 4154,
"active_shards" : 8308,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 56
}

Thanks a lot,
Mohamed.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/
topic/elasticsearch/0-aw3_GLVZ0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/0-aw3_GLVZ0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Mohamed Lrhazi) #6

Thanks Boaz, we tried restarting a few times to no avail... but we now just
went ahead and deleted the affected indexes and recreated them...

Mohamed.

On Wed, Nov 20, 2013 at 5:57 AM, Boaz Leskes b.leskes@gmail.com wrote:

Hi Mohamed,

The number of documents in a shard should not effect whether it's assigned
or not.

When a shard fails too often on a node, ES does not try to allocate that
shard on that node anymore. I think what happened is that you have shard
allocation failed too often due to the max open files issue and got all the
nodes black listed for those shards. Can you try restarting the cluster
with th new max open files settings and see if that helps?

Cheers,
Boaz

On Tue, Nov 19, 2013 at 3:13 PM, Mohamed Lrhazi ml623@georgetown.eduwrote:

Thank Boaz. We had increased the allowed max open files and for the
moment we do not see that error... We still have unassigned shards though.
It seems a colleague of mine was working on the issue too and been deleting
indexes that we empty and had unassigned shards....

Could it be that that is the issue, somehow empty, or almost empty (say
two documents, while index has 12 shards), indexes fail to get assigned?
would that be a known issue?

Thanks a lot,
Mohamed.

On Monday, November 18, 2013 5:35:48 PM UTC-5, Boaz Leskes wrote:

HI Mohamed,

If you get too many open files error you should probably do the
following: http://www.elasticsearch.org/guide/en/
elasticsearch/reference/current/setup-configuration.
html#file-descriptors

Also, the number of shards per node is very high. Unless you know for
sure you are not searching all (or many) of them, you can expect CPU issues
as they are searched in parallel.

Cheers,
Boaz

On Monday, November 18, 2013 7:48:25 PM UTC+1, Mohamed Lrhazi wrote:

After I pasted the above I notice the number of unassigned shards
actually went down to 56, from 58.... maybe it'll keep going down in the
next many hours! I'll update this thread in a few hours.

Thanks,
Mohamed.

On Mon, Nov 18, 2013 at 1:38 PM, Mohamed Lrhazi ml...@georgetown.eduwrote:

Hello,

It seems we over whelmed our 6 node cluster with indexes... we started
getting too many open files errors...
We added 6 more nodes and also upgraded to the latest version
(elasticsearch-0.90.7-1.noarch)

Now the cluster starts and loads most shards... except for 58, for
which the only log I seem to find is lines like these:

[2013-11-18 13:19:27,277][DEBUG][gateway.local] [rap-es]
[el_2003][11]: not allocating, number_of_allocated_shards_found [0],
required_number [1]
[2013-11-18 13:19:27,277][DEBUG][gateway.local] [rap-es]
[de_2005][10]: not allocating, number_of_allocated_shards_found [0],
required_number [1]
[2013-11-18 13:19:27,277][DEBUG][gateway.local] [rap-es]
[de_2005][4]: not allocating, number_of_allocated_shards_found [0],
required_number [1]

What would cause this problem? and how do we recover from it?

cluster health:

{
"cluster_name" : "foo",
"status" : "red",
"timed_out" : false,
"number_of_nodes" : 13,
"number_of_data_nodes" : 12,
"active_primary_shards" : 4154,
"active_shards" : 8308,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 56
}

Thanks a lot,
Mohamed.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/
topic/elasticsearch/0-aw3_GLVZ0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/0-aw3_GLVZ0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/0-aw3_GLVZ0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #7