Shard failures


(Drew Blessing) #1

We have been struggling with this issue for a few months. We've experienced
it in versions 0.90.6 - 0.90.13 and now in 1.1, too.

A shard (sometimes 2) will fail within a single index. We get this error
during or after our data loader indexes data. Sometimes it takes a day or
two to occur but most recently it's been immediately on/after first index.
The shard that fails is always in the same index. This is a 2-node cluster
running on CentOS 6.5 with Oracle Java 1.7.0u51. In addition, there is 1
non-data node for the process that handles indexing data and 2 non-data
nodes serving the front-end. All non-data nodes are java clients using
spring-data-elasticsearch library. All nodes are on 1.1 now. We understand
there's a probability that our loader application is causing this but we
can't see how or where. Also, it seems like a bug if a client can cause
shards to fail on the server. We are grasping at straws now and appreciate
any ideas on what could be causing this.

In this gist, the first log message is from node 1 and happens at the same
time that the shard failure occurs on node 2. See node 2 for the stack(s)
that occur when the shard fails. It's interesting that node 1 says it's
closing the connection because of it. Someone on Twitter noted that these
are WARN level messages and don't signify a failure. However, it is causing
queries against this index to totally fail, so there's definitely more than
a WARN scenario going on here. Any thoughts?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/371aa3d3-1b02-4fb5-bad7-b6217e09fb6a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Alexander Reelsen) #2

Hey,

do you run any additional plugins or is this stock elasticsearch? Can you
tell what happens on your cluster? Do you have long running
queries/operations? Can you tell more, what/how this loader executes?

Minor operational hint: Upgrade your JVM version to the latest one or _25,
the one your are using could lead to data corruption with lucene.

--Alex

On Thu, Apr 24, 2014 at 3:35 PM, Drew Blessing blessing.drew@gmail.comwrote:

We have been struggling with this issue for a few months. We've
experienced it in versions 0.90.6 - 0.90.13 and now in 1.1, too.

A shard (sometimes 2) will fail within a single index. We get this error
during or after our data loader indexes data. Sometimes it takes a day or
two to occur but most recently it's been immediately on/after first index.
The shard that fails is always in the same index. This is a 2-node cluster
running on CentOS 6.5 with Oracle Java 1.7.0u51. In addition, there is 1
non-data node for the process that handles indexing data and 2 non-data
nodes serving the front-end. All non-data nodes are java clients using
spring-data-elasticsearch library. All nodes are on 1.1 now. We understand
there's a probability that our loader application is causing this but we
can't see how or where. Also, it seems like a bug if a client can cause
shards to fail on the server. We are grasping at straws now and appreciate
any ideas on what could be causing this.

In this gist, the first log message is from node 1 and happens at the same
time that the shard failure occurs on node 2. See node 2 for the stack(s)
that occur when the shard fails. It's interesting that node 1 says it's
closing the connection because of it. Someone on Twitter noted that these
are WARN level messages and don't signify a failure. However, it is causing
queries against this index to totally fail, so there's definitely more than
a WARN scenario going on here. Any thoughts?

https://gist.github.com/dblessing/11266650

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/371aa3d3-1b02-4fb5-bad7-b6217e09fb6a%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/371aa3d3-1b02-4fb5-bad7-b6217e09fb6a%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGCwEM8E7spY%2BOU9PHfMBuLujqCZeCLXPmM4fdDasMvGyXnAgg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(system) #3