Hi All,
We've been running 0.19.3 happily in prod for at least a year on a few
production clusters, with no problem, haven't even needed to restart
anything. We are in the process of upgrading to latest 0.90 and this is
just a feeler to see if anybody else hit a similar issue to this in order
to properly postmortem.
Here is basically the chain of events:
- Cleaned up an old index and swapped in a new one, which also tweaks
replica count from 0 to 1. Just mentioning, because it is the only thing
that we did remotely close to things going wrong. - 9 hours later an index began hanging requests to index new docs.
- This caused our index queues to get backed up and some monitoring alarms
to start going off, so were aware of the issue - The cluster state was green and did the following to try to resolve:
- Restarted our indexer application that got things going again for a few
minutes, but things got gummed up shortly again. - Set replicas down to 0 and then back to 1 for the suspect indexes
- The new replicas couldn't recover and were stuck in initializing, so
cluster was in yellow state. This was interesting. - Opened and closed the indexes that could potentially be the problem. Made
no difference. - Increased concurrent recoveries (from 1 to 5). This got me down to 9
shards stuck in init. - I tried creating a new index to rebuild some content I suspected was
corrupt and that new index pushed the cluster state to red and was stuck
trying to init. - At this point, I decided it was best to restart the cluster. Things came
up clean and I don't believe there was any data corruption.
Does this sound familiar to anyone?
Here are a few bugs that I think could be related:
Many thanks for taking the time to read.
Best Regards,
Paul
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.