Constant Recovering and Unassigned shards for an index

Hello,
I've added a new server into an ES cluster. It was recognized Ok. I have 2
indexes, one is mostly Read-Only, another has heavier writing. The first
one completely recovered from the master and all shards are in green. The
second index is trying to recover for almost 6 hours and no progress at
all. I see that it puts 2 shards in Recovering mode and gets a little data
(around 1Mb), then it gets them back to Unassigned mode, and tries another
2 shards. And it repeats this pattern forever.
What could it be? How can I solve it?

Thank you,
Eugene

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Whole weekend - the same problem. One index fails to replicate from master
another index is Ok. I've restarted whole ES cluster, no help. I need any
advice. I have no clue what's going on, all logs are clean.
Thanks in advance,
Eugene

On Friday, July 19, 2013 7:56:24 PM UTC-4, Eugene Strokin wrote:

Hello,
I've added a new server into an ES cluster. It was recognized Ok. I have 2
indexes, one is mostly Read-Only, another has heavier writing. The first
one completely recovered from the master and all shards are in green. The
second index is trying to recover for almost 6 hours and no progress at
all. I see that it puts 2 shards in Recovering mode and gets a little data
(around 1Mb), then it gets them back to Unassigned mode, and tries another
2 shards. And it repeats this pattern forever.
What could it be? How can I solve it?

Thank you,
Eugene

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi Eugene,

is there anything in the logs about this?

Cheers,
Boaz

On Sunday, July 21, 2013 5:18:50 PM UTC+2, Eugene Strokin wrote:

Whole weekend - the same problem. One index fails to replicate from master
another index is Ok. I've restarted whole ES cluster, no help. I need any
advice. I have no clue what's going on, all logs are clean.
Thanks in advance,
Eugene

On Friday, July 19, 2013 7:56:24 PM UTC-4, Eugene Strokin wrote:

Hello,
I've added a new server into an ES cluster. It was recognized Ok. I have
2 indexes, one is mostly Read-Only, another has heavier writing. The first
one completely recovered from the master and all shards are in green. The
second index is trying to recover for almost 6 hours and no progress at
all. I see that it puts 2 shards in Recovering mode and gets a little data
(around 1Mb), then it gets them back to Unassigned mode, and tries another
2 shards. And it repeats this pattern forever.
What could it be? How can I solve it?

Thank you,
Eugene

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

May be this help someone in future. I had to stop my applications all
together, causing almost an hour of production downtime. After that ES was
able to replicate. I hope there is better solution, if someone knows,
please share.

On Sunday, July 21, 2013 11:18:50 AM UTC-4, Eugene Strokin wrote:

Whole weekend - the same problem. One index fails to replicate from master
another index is Ok. I've restarted whole ES cluster, no help. I need any
advice. I have no clue what's going on, all logs are clean.
Thanks in advance,
Eugene

On Friday, July 19, 2013 7:56:24 PM UTC-4, Eugene Strokin wrote:

Hello,
I've added a new server into an ES cluster. It was recognized Ok. I have
2 indexes, one is mostly Read-Only, another has heavier writing. The first
one completely recovered from the master and all shards are in green. The
second index is trying to recover for almost 6 hours and no progress at
all. I see that it puts 2 shards in Recovering mode and gets a little data
(around 1Mb), then it gets them back to Unassigned mode, and tries another
2 shards. And it repeats this pattern forever.
What could it be? How can I solve it?

Thank you,
Eugene

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi Eugene,

Google marked my reaction as spam, so I guess you didn't get it... did you
see anything in the logs about this?

Cheers,
Boaz

On Tue, Jul 23, 2013 at 3:11 PM, Eugene Strokin eugene@strokin.info wrote:

May be this help someone in future. I had to stop my applications all
together, causing almost an hour of production downtime. After that ES was
able to replicate. I hope there is better solution, if someone knows,
please share.

On Sunday, July 21, 2013 11:18:50 AM UTC-4, Eugene Strokin wrote:

Whole weekend - the same problem. One index fails to replicate from
master another index is Ok. I've restarted whole ES cluster, no help. I
need any advice. I have no clue what's going on, all logs are clean.
Thanks in advance,
Eugene

On Friday, July 19, 2013 7:56:24 PM UTC-4, Eugene Strokin wrote:

Hello,
I've added a new server into an ES cluster. It was recognized Ok. I have
2 indexes, one is mostly Read-Only, another has heavier writing. The first
one completely recovered from the master and all shards are in green. The
second index is trying to recover for almost 6 hours and no progress at
all. I see that it puts 2 shards in Recovering mode and gets a little data
(around 1Mb), then it gets them back to Unassigned mode, and tries another
2 shards. And it repeats this pattern forever.
What could it be? How can I solve it?

Thank you,
Eugene

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/0VC3gT2N3Fs/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hello Boaz,
that was bothering me the most, all the logs were clean. I've set DEBUG
level to everything I would thought of, and still nothing suspicious in the
logs.
It showed me that one index is restored shard by shard, and that was it.
While the second index was constantly in the loop, I didn't see new log
records at all.

Eugene

On Tuesday, July 23, 2013 9:33:59 AM UTC-4, Boaz Leskes wrote:

Hi Eugene,

Google marked my reaction as spam, so I guess you didn't get it... did you
see anything in the logs about this?

Cheers,
Boaz

On Tue, Jul 23, 2013 at 3:11 PM, Eugene Strokin <eug...@strokin.info<javascript:>

wrote:

May be this help someone in future. I had to stop my applications all
together, causing almost an hour of production downtime. After that ES was
able to replicate. I hope there is better solution, if someone knows,
please share.

On Sunday, July 21, 2013 11:18:50 AM UTC-4, Eugene Strokin wrote:

Whole weekend - the same problem. One index fails to replicate from
master another index is Ok. I've restarted whole ES cluster, no help. I
need any advice. I have no clue what's going on, all logs are clean.
Thanks in advance,
Eugene

On Friday, July 19, 2013 7:56:24 PM UTC-4, Eugene Strokin wrote:

Hello,
I've added a new server into an ES cluster. It was recognized Ok. I
have 2 indexes, one is mostly Read-Only, another has heavier writing. The
first one completely recovered from the master and all shards are in green.
The second index is trying to recover for almost 6 hours and no progress at
all. I see that it puts 2 shards in Recovering mode and gets a little data
(around 1Mb), then it gets them back to Unassigned mode, and tries another
2 shards. And it repeats this pattern forever.
What could it be? How can I solve it?

Thank you,
Eugene

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/0VC3gT2N3Fs/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Not the ideal solution, but whenever I get stuck recovering shards, I
simply set my replicas to 0 and let the cluster get back to green. From
there, I up the replica count. You will be in a situation where if a node
went down, you will not have all the shards for an index, but it gets me
out of the first issue.

--
Ivan

On Tue, Jul 23, 2013 at 6:11 AM, Eugene Strokin eugene@strokin.info wrote:

May be this help someone in future. I had to stop my applications all
together, causing almost an hour of production downtime. After that ES was
able to replicate. I hope there is better solution, if someone knows,
please share.

On Sunday, July 21, 2013 11:18:50 AM UTC-4, Eugene Strokin wrote:

Whole weekend - the same problem. One index fails to replicate from
master another index is Ok. I've restarted whole ES cluster, no help. I
need any advice. I have no clue what's going on, all logs are clean.
Thanks in advance,
Eugene

On Friday, July 19, 2013 7:56:24 PM UTC-4, Eugene Strokin wrote:

Hello,
I've added a new server into an ES cluster. It was recognized Ok. I have
2 indexes, one is mostly Read-Only, another has heavier writing. The first
one completely recovered from the master and all shards are in green. The
second index is trying to recover for almost 6 hours and no progress at
all. I see that it puts 2 shards in Recovering mode and gets a little data
(around 1Mb), then it gets them back to Unassigned mode, and tries another
2 shards. And it repeats this pattern forever.
What could it be? How can I solve it?

Thank you,
Eugene

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Thanks Ivan,
I'm trying adding another node to the cluster, and got stock with the same
problem again.
I'm trying your method, it is better than shutting down whole system and
wait for replication finished.
I've set replicas to 0, I see my cluster reports green now. But 2 out of 5
of my shards now constantly in RELOCATING state, even though the replicas
number is 0.
Did you see this as well? Should I just wait? Anything else could be done
here?

Thanks for your help,
Eugene

On Tuesday, July 23, 2013 6:45:16 PM UTC-4, Ivan Brusic wrote:

Not the ideal solution, but whenever I get stuck recovering shards, I
simply set my replicas to 0 and let the cluster get back to green. From
there, I up the replica count. You will be in a situation where if a node
went down, you will not have all the shards for an index, but it gets me
out of the first issue.

--
Ivan

On Tue, Jul 23, 2013 at 6:11 AM, Eugene Strokin <eug...@strokin.info<javascript:>

wrote:

May be this help someone in future. I had to stop my applications all
together, causing almost an hour of production downtime. After that ES was
able to replicate. I hope there is better solution, if someone knows,
please share.

On Sunday, July 21, 2013 11:18:50 AM UTC-4, Eugene Strokin wrote:

Whole weekend - the same problem. One index fails to replicate from
master another index is Ok. I've restarted whole ES cluster, no help. I
need any advice. I have no clue what's going on, all logs are clean.
Thanks in advance,
Eugene

On Friday, July 19, 2013 7:56:24 PM UTC-4, Eugene Strokin wrote:

Hello,
I've added a new server into an ES cluster. It was recognized Ok. I
have 2 indexes, one is mostly Read-Only, another has heavier writing. The
first one completely recovered from the master and all shards are in green.
The second index is trying to recover for almost 6 hours and no progress at
all. I see that it puts 2 shards in Recovering mode and gets a little data
(around 1Mb), then it gets them back to Unassigned mode, and tries another
2 shards. And it repeats this pattern forever.
What could it be? How can I solve it?

Thank you,
Eugene

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Ok,.. I've finally got my cluster into healthy state. I still don't know
what the problem was, but here what I've done:

  • Initial problem: Added a new node into the cluster, replication is not
    happening. Shards are in RECOVERY state constantly.
  • Set number of replicas to 0 -> I've got 2 shards relocated to the new
    node, and 3 shards are still in the existing node.
  • Set number of replicas to 1 -> the cluster is trying to replicate, but
    the same pattern repeats - 2 shards are getting from unassigned to recovery
    state, then back to unassigned.
  • Added a 3rd node into the cluster, set number of replicas to 2 -> Some
    shards got replicated some are still looping to recovery and back to
    unassigned.
  • Luckily got at least 1 replica for each shard on different nodes, some
    shards are still looping into unassigned state, shutdown the 1st node -
    master. Master got reelected, all shads got replicated. I've set number of
    replicas to 1, and the cluster is green now.

I'm guessing I had some problem with the master, and even restart didn't
help. But once the master got reelected, the situation got normalized.

Thanks for you help,
Eugene

On Friday, July 19, 2013 7:56:24 PM UTC-4, Eugene Strokin wrote:

Hello,
I've added a new server into an ES cluster. It was recognized Ok. I have 2
indexes, one is mostly Read-Only, another has heavier writing. The first
one completely recovered from the master and all shards are in green. The
second index is trying to recover for almost 6 hours and no progress at
all. I see that it puts 2 shards in Recovering mode and gets a little data
(around 1Mb), then it gets them back to Unassigned mode, and tries another
2 shards. And it repeats this pattern forever.
What could it be? How can I solve it?

Thank you,
Eugene

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi Eugene,

That sucks. This sounds like a java backward compatibility issue we run
into a while ago. Can you post the exact version numbers ES runs with for
all nodes? The easiest is the gist the output of
http://localhost:9200/_nodes?all which might reveal more interesting
information.

Cheers,
Boaz

On Mon, Aug 12, 2013 at 5:26 PM, Eugene Strokin eugene@strokin.info wrote:

Ok,.. I've finally got my cluster into healthy state. I still don't know
what the problem was, but here what I've done:

  • Initial problem: Added a new node into the cluster, replication is not
    happening. Shards are in RECOVERY state constantly.
  • Set number of replicas to 0 -> I've got 2 shards relocated to the new
    node, and 3 shards are still in the existing node.
  • Set number of replicas to 1 -> the cluster is trying to replicate, but
    the same pattern repeats - 2 shards are getting from unassigned to recovery
    state, then back to unassigned.
  • Added a 3rd node into the cluster, set number of replicas to 2 -> Some
    shards got replicated some are still looping to recovery and back to
    unassigned.
  • Luckily got at least 1 replica for each shard on different nodes, some
    shards are still looping into unassigned state, shutdown the 1st node -
    master. Master got reelected, all shads got replicated. I've set number of
    replicas to 1, and the cluster is green now.

I'm guessing I had some problem with the master, and even restart didn't
help. But once the master got reelected, the situation got normalized.

Thanks for you help,
Eugene

On Friday, July 19, 2013 7:56:24 PM UTC-4, Eugene Strokin wrote:

Hello,
I've added a new server into an ES cluster. It was recognized Ok. I have
2 indexes, one is mostly Read-Only, another has heavier writing. The first
one completely recovered from the master and all shards are in green. The
second index is trying to recover for almost 6 hours and no progress at
all. I see that it puts 2 shards in Recovering mode and gets a little data
(around 1Mb), then it gets them back to Unassigned mode, and tries another
2 shards. And it repeats this pattern forever.
What could it be? How can I solve it?

Thank you,
Eugene

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/0VC3gT2N3Fs/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Is there an issue open for this already in github? I'm having the same
issue in our system and we are getting around it by doing what Ivan posted.
In our system (two master and data nodes in cluster and many clients
(connecting with TransportClient) ) the JVMs are exactly the same version:
version: "1.7.0_25"
vm_name: "Java HotSpot(TM) 64-Bit Server VM"
vm_version: "23.25-b01"
vm_vendor: "Oracle Corporation"

and ES are also exactly the same: 0.90.5.

Thanks!

On Monday, August 12, 2013 9:31:34 AM UTC-7, Boaz Leskes wrote:

Hi Eugene,

That sucks. This sounds like a java backward compatibility issue we run
into a while ago. Can you post the exact version numbers ES runs with for
all nodes? The easiest is the gist the output of
http://localhost:9200/_nodes?all which might reveal more interesting
information.

Cheers,
Boaz

On Mon, Aug 12, 2013 at 5:26 PM, Eugene Strokin <eug...@strokin.info<javascript:>

wrote:

Ok,.. I've finally got my cluster into healthy state. I still don't know
what the problem was, but here what I've done:

  • Initial problem: Added a new node into the cluster, replication is not
    happening. Shards are in RECOVERY state constantly.
  • Set number of replicas to 0 -> I've got 2 shards relocated to the new
    node, and 3 shards are still in the existing node.
  • Set number of replicas to 1 -> the cluster is trying to replicate, but
    the same pattern repeats - 2 shards are getting from unassigned to recovery
    state, then back to unassigned.
  • Added a 3rd node into the cluster, set number of replicas to 2 -> Some
    shards got replicated some are still looping to recovery and back to
    unassigned.
  • Luckily got at least 1 replica for each shard on different nodes, some
    shards are still looping into unassigned state, shutdown the 1st node -
    master. Master got reelected, all shads got replicated. I've set number of
    replicas to 1, and the cluster is green now.

I'm guessing I had some problem with the master, and even restart didn't
help. But once the master got reelected, the situation got normalized.

Thanks for you help,
Eugene

On Friday, July 19, 2013 7:56:24 PM UTC-4, Eugene Strokin wrote:

Hello,
I've added a new server into an ES cluster. It was recognized Ok. I have
2 indexes, one is mostly Read-Only, another has heavier writing. The first
one completely recovered from the master and all shards are in green. The
second index is trying to recover for almost 6 hours and no progress at
all. I see that it puts 2 shards in Recovering mode and gets a little data
(around 1Mb), then it gets them back to Unassigned mode, and tries another
2 shards. And it repeats this pattern forever.
What could it be? How can I solve it?

Thank you,
Eugene

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/0VC3gT2N3Fs/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi Carofe,

We actually still don't know what the problem exactly is and need more
info to make it concrete. I'd love to help trace it.

Can you describe what you are having exactly? I take it replication fail?

These would help:

  1. anything in the logs?
  2. A dump of the cluster state at the moment you have problem? (you can get
    it with curl -XGET "http://localhost:9200/_cluster/state" )
  3. A dump of the nodes stats: curl -XGET "
    http://localhost:9200/_cluster/nodes/stats/?all"

Cheers,
Boaz

On Sat, Oct 26, 2013 at 1:11 AM, Carofe carlosrodrifernandez@gmail.comwrote:

Is there an issue open for this already in github? I'm having the same
issue in our system and we are getting around it by doing what Ivan posted.
In our system (two master and data nodes in cluster and many clients
(connecting with TransportClient) ) the JVMs are exactly the same version:
version: "1.7.0_25"
vm_name: "Java HotSpot(TM) 64-Bit Server VM"
vm_version: "23.25-b01"
vm_vendor: "Oracle Corporation"

and ES are also exactly the same: 0.90.5.

Thanks!

On Monday, August 12, 2013 9:31:34 AM UTC-7, Boaz Leskes wrote:

Hi Eugene,

That sucks. This sounds like a java backward compatibility issue we run
into a while ago. Can you post the exact version numbers ES runs with for
all nodes? The easiest is the gist the output of
http://localhost:9200/_nodes?**all http://localhost:9200/_nodes?all which might reveal more interesting information.

Cheers,
Boaz

On Mon, Aug 12, 2013 at 5:26 PM, Eugene Strokin eug...@strokin.infowrote:

Ok,.. I've finally got my cluster into healthy state. I still don't know
what the problem was, but here what I've done:

  • Initial problem: Added a new node into the cluster, replication is not
    happening. Shards are in RECOVERY state constantly.
  • Set number of replicas to 0 -> I've got 2 shards relocated to the new
    node, and 3 shards are still in the existing node.
  • Set number of replicas to 1 -> the cluster is trying to replicate, but
    the same pattern repeats - 2 shards are getting from unassigned to recovery
    state, then back to unassigned.
  • Added a 3rd node into the cluster, set number of replicas to 2 -> Some
    shards got replicated some are still looping to recovery and back to
    unassigned.
  • Luckily got at least 1 replica for each shard on different nodes, some
    shards are still looping into unassigned state, shutdown the 1st node -
    master. Master got reelected, all shads got replicated. I've set number of
    replicas to 1, and the cluster is green now.

I'm guessing I had some problem with the master, and even restart didn't
help. But once the master got reelected, the situation got normalized.

Thanks for you help,
Eugene

On Friday, July 19, 2013 7:56:24 PM UTC-4, Eugene Strokin wrote:

Hello,
I've added a new server into an ES cluster. It was recognized Ok. I
have 2 indexes, one is mostly Read-Only, another has heavier writing. The
first one completely recovered from the master and all shards are in green.
The second index is trying to recover for almost 6 hours and no progress at
all. I see that it puts 2 shards in Recovering mode and gets a little data
(around 1Mb), then it gets them back to Unassigned mode, and tries another
2 shards. And it repeats this pattern forever.
What could it be? How can I solve it?

Thank you,
Eugene

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/**
topic/elasticsearch/**0VC3gT2N3Fs/unsubscribehttps://groups.google.com/d/topic/elasticsearch/0VC3gT2N3Fs/unsubscribe
.
To unsubscribe from this group and all its topics, send an email to
elasticsearc...@**googlegroups.com.

For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/0VC3gT2N3Fs/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.