We have a 3 node m1.large ES/logstash cluster with version 0.90.3.
Everything has been running fine for a very long time.
We needed to upgrade our cluster to use m2.xlarge, so we killed a node and
brought up a new brand new m2.xlarge machine and let the cluster go from
yellow to green.
But when we did the next machine, things went south.
The latest logstash index has all of its primary and replica shards
unassigned, and they are stuck that way.
I'm attaching a screenshot of the head plugin of the last index (you can
see that we have replica = 1):
[2014-04-04 01:59:06,829][DEBUG][action.admin.indices.close] [Crimson
Dynamo] failed to close indices [logstash-2014.04.04]
org.elasticsearch.indices.IndexPrimaryShardNotAllocatedException:
[logstash-2014.04.04] primary not allocated post api|
at
org.elasticsearch.cluster.metadata.MetaDataIndexStateService$1.execute(MetaDataIndexStateService.java:95)
at
org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:285)
at
org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:143)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)
I've tried restarting the cluster, but no love.
My cluster is no longer consuming any logs because of this.
Any idea how I can even begin to trouble shoot this?
We have a 3 node m1.large ES/logstash cluster with version 0.90.3.
Everything has been running fine for a very long time.
We needed to upgrade our cluster to use m2.xlarge, so we killed a node and
brought up a new brand new m2.xlarge machine and let the cluster go from
yellow to green.
But when we did the next machine, things went south.
The latest logstash index has all of its primary and replica shards
unassigned, and they are stuck that way.
I'm attaching a screenshot of the head plugin of the last index (you can
see that we have replica = 1):
[2014-04-04 01:59:06,829][DEBUG][action.admin.indices.close] [Crimson
Dynamo] failed to close indices [logstash-2014.04.04]
org.elasticsearch.indices.IndexPrimaryShardNotAllocatedException:
[logstash-2014.04.04] primary not allocated post api|
at
org.elasticsearch.cluster.metadata.MetaDataIndexStateService$1.execute(MetaDataIndexStateService.java:95)
at
org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:285)
at
org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:143)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)
I've tried restarting the cluster, but no love.
My cluster is no longer consuming any logs because of this.
Any idea how I can even begin to trouble shoot this?
We have a 3 node m1.large ES/logstash cluster with version 0.90.3.
Everything has been running fine for a very long time.
We needed to upgrade our cluster to use m2.xlarge, so we killed a node
and brought up a new brand new m2.xlarge machine and let the cluster go
from yellow to green.
But when we did the next machine, things went south.
The latest logstash index has all of its primary and replica shards
unassigned, and they are stuck that way.
I'm attaching a screenshot of the head plugin of the last index (you can
see that we have replica = 1):
[2014-04-04 01:59:06,829][DEBUG][action.admin.indices.close] [Crimson
Dynamo] failed to close indices [logstash-2014.04.04]
org.elasticsearch.indices.IndexPrimaryShardNotAllocatedException:
[logstash-2014.04.04] primary not allocated post api|
at
org.elasticsearch.cluster.metadata.MetaDataIndexStateService$1.execute(MetaDataIndexStateService.java:95)
at
org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:285)
at
org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:143)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)
I've tried restarting the cluster, but no love.
My cluster is no longer consuming any logs because of this.
Any idea how I can even begin to trouble shoot this?
No love. I deleted the index via the head plugin, but the index is still
there, and the shards are still unassigned.
Nothing in the logs showed any errors either.
Maybe this is not the proper way to delete an index? (or maybe it got
deleted and re-created so fast that I missed it...)
I have no idea what that does, but it can't get any more broke than it is,
so I'll give it a try...
On Thursday, April 3, 2014 10:57:39 PM UTC-4, Alexander Gray II wrote:
No love. I deleted the index via the head plugin, but the index is still
there, and the shards are still unassigned.
Nothing in the logs showed any errors either.
Maybe this is not the proper way to delete an index? (or maybe it got
deleted and re-created so fast that I missed it...)
You should only need to do that if you issued a true to begin with.
What version are you on? How many nodes, indexes, shards? Try installing a
plugin like elastichq or marvel to give you a better idea of what your
cluster status is. Bigdesk is good, but you only see what one individual
node is doing.
I have no idea what that does, but it can't get any more broke than it is,
so I'll give it a try...
On Thursday, April 3, 2014 10:57:39 PM UTC-4, Alexander Gray II wrote:
No love. I deleted the index via the head plugin, but the index is still
there, and the shards are still unassigned.
Nothing in the logs showed any errors either.
Maybe this is not the proper way to delete an index? (or maybe it got
deleted and re-created so fast that I missed it...)
We have a 3 node m2.xlarge (17 gigs of memory) ES/logstash cluster with
version 0.90.3.
ES_MIN_MEM=8g,ES_MAX_MEM=8g
5 Shards, 1 replication.
I installed head, bigdesk, and paramedic.
There are 33 indexes. We are using this with logstash.
We have 64K open file descriptors (logstash reports everything is cool in
this area).
CPU usage is low.
I'm attaching a screenshot of the troubling indexes, particularly the last
one, which it doesn't look like it's doing anything
[image: Inline image 1].
I'll guess I'll install elastichq or marvel. Maybe that can tell me more.
The only exception I see in the logs is in my first message, but googling
that exception doesn't give any hits (besides where in their source code
that string is exists).
I installed elastichq.
Interestingly enough, it doesn't even show that index, but
head/paramedic/bigdesk does. weird.
All the diagnostics of elastichq shows mostly green. There are few
"yellows" under Index Activity for "Get Total", but it doesn't strike me as
something that is related to this.
What I did find was that 2 of the shards have been in the "initializing"
state for quite some time.
Sounds bad, but maybe I should wait. maybe it will just go away.
Note that since this happened, we are pretty much dead in the water, since
no new logs are being ingested at all.
I almost want to say things worked better and with the m1.larges, instead
of the m2.xlarges, but I don't see how that is possible.
I'm open for any wild suggestions to get this cluster back to a working
state.
ok. i stopped the entire cluster and started one ES node at a time, and
that seemed to do the trick, even though that's one of the first things I
did when things went ary.
I have no idea how it could have gotten into that state to begin with, but
it's all good now.
We lost a ton of logs, but it looks like everything is OK now.
Alex
On Thursday, April 3, 2014 11:42:41 PM UTC-4, Alexander Gray II wrote:
I installed elastichq.
Interestingly enough, it doesn't even show that index, but
head/paramedic/bigdesk does. weird.
All the diagnostics of elastichq shows mostly green. There are few
"yellows" under Index Activity for "Get Total", but it doesn't strike me as
something that is related to this.
What I did find was that 2 of the shards have been in the "initializing"
state for quite some time.
Sounds bad, but maybe I should wait. maybe it will just go away.
Note that since this happened, we are pretty much dead in the water, since
no new logs are being ingested at all.
I almost want to say things worked better and with the m1.larges, instead
of the m2.xlarges, but I don't see how that is possible.
I'm open for any wild suggestions to get this cluster back to a working
state.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.