So, a little more digging and it looks like it was holding onto a
write.lock that was gone.
sudo lsof -uelasticsearch | grep 'legacy/0'
java 27517 elasticsearch 1042uW REG 202,1 0
525279 /var/data/elasticsearch/Rage Against the
We did delete some leftover lock files after the nodes powered down, but
that seems like it shouldn't have caused this (unless we made a mistake and
nuked it on a live instance). Somehow that plus the OOM corruption led to a
pretty crazy situation. We're almost back from it after some restarts, we
should be able to have a blog post on the situation after. I'll follow up
with results and a link ASAP.
On Tuesday, December 17, 2013 8:13:39 PM UTC-8, Bryan Helmig wrote:
We're also fine with loosing a few docs as we can reindex them from
another source, so dropping the documents works for us.
On Tuesday, December 17, 2013 7:47:21 PM UTC-8, Bryan Helmig wrote:
All replicas have the same corruption, it seems. We can't get a primary
up for shard 0, therefore the replica never comes up. Does that make sense?
On Tuesday, December 17, 2013 6:33:19 PM UTC-8, Jörg Prante wrote:
Hm, just wanted to clarify that I'm not familiar with the effects of
latest ES on Lucene 4 "index.shard.check_on_startup: fix"
Even if I can test it, there is no guarantee that it works for you.
Different systems, different index, different corruptions... who knows.
I'm quite puzzled, you don't have a replica shard? The "CheckIndex" is
really a last resort if there are no replica, and it is not the preferred
method to ensure data integrity in ES...
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to firstname.lastname@example.org.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/974f400c-222b-454b-b997-0a7ca7955ec7%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.