Indexing hangs in 0.19.3 and allocations stuck - sound familiar at all?

ppearcy · August 26, 2013, 4:13pm

Hi All,
We've been running 0.19.3 happily in prod for at least a year on a few
production clusters, with no problem, haven't even needed to restart
anything. We are in the process of upgrading to latest 0.90 and this is
just a feeler to see if anybody else hit a similar issue to this in order
to properly postmortem.

Here is basically the chain of events:

Cleaned up an old index and swapped in a new one, which also tweaks
replica count from 0 to 1. Just mentioning, because it is the only thing
that we did remotely close to things going wrong.
9 hours later an index began hanging requests to index new docs.
This caused our index queues to get backed up and some monitoring alarms
to start going off, so were aware of the issue
The cluster state was green and did the following to try to resolve:

Restarted our indexer application that got things going again for a few
minutes, but things got gummed up shortly again.
Set replicas down to 0 and then back to 1 for the suspect indexes
The new replicas couldn't recover and were stuck in initializing, so
cluster was in yellow state. This was interesting.
Opened and closed the indexes that could potentially be the problem. Made
no difference.
Increased concurrent recoveries (from 1 to 5). This got me down to 9
shards stuck in init.
I tried creating a new index to rebuild some content I suspected was
corrupt and that new index pushed the cluster state to red and was stuck
trying to init.
At this point, I decided it was best to restart the cluster. Things came
up clean and I don't believe there was any data corruption.

Does this sound familiar to anyone?

Here are a few bugs that I think could be related:

Many thanks for taking the time to read.

Best Regards,
Paul

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

jprante · August 26, 2013, 7:26pm

I had also uninitialized shards in 0.19 and they went away after full
restart. Is there something in the log files? Maybe file descriptor usage
exceeding, maybe other unexpected resource shortage? I went to 0.19.11
then, but I can't tell if this helped or not.

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

ppearcy · August 26, 2013, 9:44pm

The elasticsearch logs are silent about the issue. File descriptors are
fine, although, I have seen what you're describing in our acceptance env on
0.19.3 where file descriptor usage just starts climbing. We default to 128K
open and start alarming at 64K.

RAM, disk, cpu all look fine. Documents coming in from a different queue
kept flowing without issue, it was a certain index that went AWOL on
indexing side.

Thanks!
Paul

On Monday, August 26, 2013 1:26:56 PM UTC-6, Jörg Prante wrote:

I had also uninitialized shards in 0.19 and they went away after full
restart. Is there something in the log files? Maybe file descriptor usage
exceeding, maybe other unexpected resource shortage? I went to 0.19.11
then, but I can't tell if this helped or not.

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
Elastic cluster keeps crashing/hanging Elasticsearch	4	4087	July 28, 2017
Shard stucked in initializing state (elasticsearch crash test) Elasticsearch	3	482	July 6, 2017
Cluster stuck in Initializing Elasticsearch	1	2283	July 5, 2017
ES bugs in 0.20.4 and 0.20.5 cause shards allocation failure and stuck in initializing state Elasticsearch	21	874	July 6, 2017
Shard stuck in INITIALIZING Elasticsearch	5	5612	July 5, 2017

Indexing hangs in 0.19.3 and allocations stuck - sound familiar at all?

Related Topics