We've been running 0.19.3 happily in prod for at least a year on a few
production clusters, with no problem, haven't even needed to restart
anything. We are in the process of upgrading to latest 0.90 and this is
just a feeler to see if anybody else hit a similar issue to this in order
to properly postmortem.
Here is basically the chain of events:
- Cleaned up an old index and swapped in a new one, which also tweaks
replica count from 0 to 1. Just mentioning, because it is the only thing
that we did remotely close to things going wrong.
- 9 hours later an index began hanging requests to index new docs.
- This caused our index queues to get backed up and some monitoring alarms
to start going off, so were aware of the issue
- The cluster state was green and did the following to try to resolve:
- Restarted our indexer application that got things going again for a few
minutes, but things got gummed up shortly again.
- Set replicas down to 0 and then back to 1 for the suspect indexes
- The new replicas couldn't recover and were stuck in initializing, so
cluster was in yellow state. This was interesting.
- Opened and closed the indexes that could potentially be the problem. Made
- Increased concurrent recoveries (from 1 to 5). This got me down to 9
shards stuck in init.
- I tried creating a new index to rebuild some content I suspected was
corrupt and that new index pushed the cluster state to red and was stuck
trying to init.
- At this point, I decided it was best to restart the cluster. Things came
up clean and I don't believe there was any data corruption.
Does this sound familiar to anyone?
Here are a few bugs that I think could be related:
Many thanks for taking the time to read.
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to email@example.com.
For more options, visit https://groups.google.com/groups/opt_out.