We have been struggling with this issue for a few months. We've experienced
it in versions 0.90.6 - 0.90.13 and now in 1.1, too.
A shard (sometimes 2) will fail within a single index. We get this error
during or after our data loader indexes data. Sometimes it takes a day or
two to occur but most recently it's been immediately on/after first index.
The shard that fails is always in the same index. This is a 2-node cluster
running on CentOS 6.5 with Oracle Java 1.7.0u51. In addition, there is 1
non-data node for the process that handles indexing data and 2 non-data
nodes serving the front-end. All non-data nodes are java clients using
spring-data-elasticsearch library. All nodes are on 1.1 now. We understand
there's a probability that our loader application is causing this but we
can't see how or where. Also, it seems like a bug if a client can cause
shards to fail on the server. We are grasping at straws now and appreciate
any ideas on what could be causing this.
In this gist, the first log message is from node 1 and happens at the same
time that the shard failure occurs on node 2. See node 2 for the stack(s)
that occur when the shard fails. It's interesting that node 1 says it's
closing the connection because of it. Someone on Twitter noted that these
are WARN level messages and don't signify a failure. However, it is causing
queries against this index to totally fail, so there's definitely more than
a WARN scenario going on here. Any thoughts?
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to email@example.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/371aa3d3-1b02-4fb5-bad7-b6217e09fb6a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.