Permanent unassigned shards in latest logstash index


(Alexander Gray II) #1

We have a 3 node m1.large ES/logstash cluster with version 0.90.3.
Everything has been running fine for a very long time.

We needed to upgrade our cluster to use m2.xlarge, so we killed a node and
brought up a new brand new m2.xlarge machine and let the cluster go from
yellow to green.

But when we did the next machine, things went south.

The latest logstash index has all of its primary and replica shards
unassigned, and they are stuck that way.
I'm attaching a screenshot of the head plugin of the last index (you can
see that we have replica = 1):

https://lh3.googleusercontent.com/-pgfCiEKaBkE/Uz4cX0ad0yI/AAAAAAAADKY/BcRNvxa1sIY/s3200/Screen+Shot+2014-04-03+at+10.41.30+PM.png

The only exception i see in the logs is:

[2014-04-04 01:59:06,829][DEBUG][action.admin.indices.close] [Crimson
Dynamo] failed to close indices [logstash-2014.04.04]
org.elasticsearch.indices.IndexPrimaryShardNotAllocatedException:
[logstash-2014.04.04] primary not allocated post api|
at
org.elasticsearch.cluster.metadata.MetaDataIndexStateService$1.execute(MetaDataIndexStateService.java:95)
at
org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:285)
at
org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:143)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)

I've tried restarting the cluster, but no love.

My cluster is no longer consuming any logs because of this.

Any idea how I can even begin to trouble shoot this?

Thanks,

alex

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8dc86c75-7a31-4885-8450-d44cc833140b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Mark Walkom) #2

If it has data you're ok with losing, just delete the index and let it get
recreated automatically.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 4 April 2014 13:45, Alexander Gray II grayaii@gmail.com wrote:

We have a 3 node m1.large ES/logstash cluster with version 0.90.3.
Everything has been running fine for a very long time.

We needed to upgrade our cluster to use m2.xlarge, so we killed a node and
brought up a new brand new m2.xlarge machine and let the cluster go from
yellow to green.

But when we did the next machine, things went south.

The latest logstash index has all of its primary and replica shards
unassigned, and they are stuck that way.
I'm attaching a screenshot of the head plugin of the last index (you can
see that we have replica = 1):

https://lh3.googleusercontent.com/-pgfCiEKaBkE/Uz4cX0ad0yI/AAAAAAAADKY/BcRNvxa1sIY/s3200/Screen+Shot+2014-04-03+at+10.41.30+PM.png

The only exception i see in the logs is:

[2014-04-04 01:59:06,829][DEBUG][action.admin.indices.close] [Crimson
Dynamo] failed to close indices [logstash-2014.04.04]
org.elasticsearch.indices.IndexPrimaryShardNotAllocatedException:
[logstash-2014.04.04] primary not allocated post api|
at
org.elasticsearch.cluster.metadata.MetaDataIndexStateService$1.execute(MetaDataIndexStateService.java:95)
at
org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:285)
at
org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:143)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)

I've tried restarting the cluster, but no love.

My cluster is no longer consuming any logs because of this.

Any idea how I can even begin to trouble shoot this?

Thanks,

alex

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/8dc86c75-7a31-4885-8450-d44cc833140b%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/8dc86c75-7a31-4885-8450-d44cc833140b%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624Zh1jNBCCc4RA9zkLvzAS2JsPG5hre5yVjWSeuehq8YTQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Mohit Anchlia) #3

Is there a way to fix this type of issue without having to delete an index?

On Thu, Apr 3, 2014 at 7:46 PM, Mark Walkom markw@campaignmonitor.comwrote:

If it has data you're ok with losing, just delete the index and let it get
recreated automatically.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 4 April 2014 13:45, Alexander Gray II grayaii@gmail.com wrote:

We have a 3 node m1.large ES/logstash cluster with version 0.90.3.
Everything has been running fine for a very long time.

We needed to upgrade our cluster to use m2.xlarge, so we killed a node
and brought up a new brand new m2.xlarge machine and let the cluster go
from yellow to green.

But when we did the next machine, things went south.

The latest logstash index has all of its primary and replica shards
unassigned, and they are stuck that way.
I'm attaching a screenshot of the head plugin of the last index (you can
see that we have replica = 1):

https://lh3.googleusercontent.com/-pgfCiEKaBkE/Uz4cX0ad0yI/AAAAAAAADKY/BcRNvxa1sIY/s3200/Screen+Shot+2014-04-03+at+10.41.30+PM.png

The only exception i see in the logs is:

[2014-04-04 01:59:06,829][DEBUG][action.admin.indices.close] [Crimson
Dynamo] failed to close indices [logstash-2014.04.04]
org.elasticsearch.indices.IndexPrimaryShardNotAllocatedException:
[logstash-2014.04.04] primary not allocated post api|
at
org.elasticsearch.cluster.metadata.MetaDataIndexStateService$1.execute(MetaDataIndexStateService.java:95)
at
org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:285)
at
org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:143)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)

I've tried restarting the cluster, but no love.

My cluster is no longer consuming any logs because of this.

Any idea how I can even begin to trouble shoot this?

Thanks,

alex

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/8dc86c75-7a31-4885-8450-d44cc833140b%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/8dc86c75-7a31-4885-8450-d44cc833140b%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEM624Zh1jNBCCc4RA9zkLvzAS2JsPG5hre5yVjWSeuehq8YTQ%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CAEM624Zh1jNBCCc4RA9zkLvzAS2JsPG5hre5yVjWSeuehq8YTQ%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAOT3TWroCX4REmwUC1SRGmUMvYnGc3mkUO6G6U6Dp-nQjNo3CQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Alexander Gray II) #4

No love. I deleted the index via the head plugin, but the index is still
there, and the shards are still unassigned.
Nothing in the logs showed any errors either.
Maybe this is not the proper way to delete an index? (or maybe it got
deleted and re-created so fast that I missed it...)

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5a4e72b4-6781-4fa1-8088-6134ad8e9489%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Alexander Gray II) #5

Maybe I have to do this?


ie:

curl -XPUT 'localhost:9200//_settings' -d ' {"index.routing.allocation.disable_allocation": false}'

I have no idea what that does, but it can't get any more broke than it is,
so I'll give it a try...

On Thursday, April 3, 2014 10:57:39 PM UTC-4, Alexander Gray II wrote:

No love. I deleted the index via the head plugin, but the index is still
there, and the shards are still unassigned.
Nothing in the logs showed any errors either.
Maybe this is not the proper way to delete an index? (or maybe it got
deleted and re-created so fast that I missed it...)

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/54be9890-4f9d-48cb-aa39-5b28f0331ebd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Mark Walkom) #6

You should only need to do that if you issued a true to begin with.

What version are you on? How many nodes, indexes, shards? Try installing a
plugin like elastichq or marvel to give you a better idea of what your
cluster status is. Bigdesk is good, but you only see what one individual
node is doing.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 4 April 2014 14:01, Alexander Gray II grayaii@gmail.com wrote:

Maybe I have to do this?

http://stackoverflow.com/questions/19967472/elasticsearch-unassigned-shards-how-to-fix
ie:

curl -XPUT 'localhost:9200//_settings' -d ' {"index.routing.allocation.disable_allocation": false}'

I have no idea what that does, but it can't get any more broke than it is,
so I'll give it a try...

On Thursday, April 3, 2014 10:57:39 PM UTC-4, Alexander Gray II wrote:

No love. I deleted the index via the head plugin, but the index is still
there, and the shards are still unassigned.
Nothing in the logs showed any errors either.
Maybe this is not the proper way to delete an index? (or maybe it got
deleted and re-created so fast that I missed it...)

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/54be9890-4f9d-48cb-aa39-5b28f0331ebd%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/54be9890-4f9d-48cb-aa39-5b28f0331ebd%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624YSwPm5Bvb-srOV4JmrH3Y06kS%3DyvyohUxxdsLnzacAJg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Alexander Gray II) #7

We have a 3 node m2.xlarge (17 gigs of memory) ES/logstash cluster with
version 0.90.3.

ES_MIN_MEM=8g,ES_MAX_MEM=8g
5 Shards, 1 replication.
I installed head, bigdesk, and paramedic.
There are 33 indexes. We are using this with logstash.
We have 64K open file descriptors (logstash reports everything is cool in
this area).
CPU usage is low.
I'm attaching a screenshot of the troubling indexes, particularly the last
one, which it doesn't look like it's doing anything
[image: Inline image 1].

I'll guess I'll install elastichq or marvel. Maybe that can tell me more.
The only exception I see in the logs is in my first message, but googling
that exception doesn't give any hits (besides where in their source code
that string is exists).

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAF_B_H8EnKyfB7SNok%2B7M%3DZrOCvRKTR-EekxoYrEfOxaFqAsrg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Alexander Gray II) #8

I installed elastichq.
Interestingly enough, it doesn't even show that index, but
head/paramedic/bigdesk does. weird.
All the diagnostics of elastichq shows mostly green. There are few
"yellows" under Index Activity for "Get Total", but it doesn't strike me as
something that is related to this.
What I did find was that 2 of the shards have been in the "initializing"
state for quite some time.
Sounds bad, but maybe I should wait. maybe it will just go away.
Note that since this happened, we are pretty much dead in the water, since
no new logs are being ingested at all.
I almost want to say things worked better and with the m1.larges, instead
of the m2.xlarges, but I don't see how that is possible.
I'm open for any wild suggestions to get this cluster back to a working
state.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/493e6629-f24b-4c51-970b-4aafa2a81e91%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Alexander Gray II) #9

ok. i stopped the entire cluster and started one ES node at a time, and
that seemed to do the trick, even though that's one of the first things I
did when things went ary.
I have no idea how it could have gotten into that state to begin with, but
it's all good now.
We lost a ton of logs, but it looks like everything is OK now.
Alex

On Thursday, April 3, 2014 11:42:41 PM UTC-4, Alexander Gray II wrote:

I installed elastichq.
Interestingly enough, it doesn't even show that index, but
head/paramedic/bigdesk does. weird.
All the diagnostics of elastichq shows mostly green. There are few
"yellows" under Index Activity for "Get Total", but it doesn't strike me as
something that is related to this.
What I did find was that 2 of the shards have been in the "initializing"
state for quite some time.
Sounds bad, but maybe I should wait. maybe it will just go away.
Note that since this happened, we are pretty much dead in the water, since
no new logs are being ingested at all.
I almost want to say things worked better and with the m1.larges, instead
of the m2.xlarges, but I don't see how that is possible.
I'm open for any wild suggestions to get this cluster back to a working
state.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/dc353f0a-a136-4fd9-9a6b-d6ded561905e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #10