I am trying to use cluster.routing.allocation.enable to speed up node
restarts. As I understand it, if I set cluster.routing.allocation.enable to
"none", restart a node, then set cluster.routing.allocation.enable to
"all", the shards that go UNASSIGNED when the node goes down should start
back up on the same node they were assigned to previously. But in practice
when I do this, the shards get assigned across the entire cluster when I
set cluster.routing.allocation.enable back to "all", and then after that,
some amount of rebalancing happens.
How can I avoid this, and make shards on a restarted node come back on the
same node?
To be clear, here's exactly the sequence of events:
UNASSIGNED shards mentioned in (3) begin being assigned across all nodes
in the cluster
After all UNASSIGNED nodes are assigned, some start rebalancing
(migrating to other nodes)
Cluster is happy
The amount of data in this cluster is very large, and this process can take
close to 24 hours. So I'd like very much to avoid that for routine restarts.
to allow the cluster to work (and be able to create indices) during
restart.
But same issue: node is back up but nothing happen until I enable all
allocation again
I have gone through elasticsearch documentation related to recovery,
gateway, cluster settings without finding any parameters to activate or
configure this initial recovery of local indices.
On Mon, Jul 7, 2014 at 4:16 AM, Grégoire Seux g.seux@criteo.com wrote:
Andrew,
Have you found a solution (or explaination) to your issue ?
We are using elasticsearch 1.1.1, what about you ?
Hi,
I haven't learned anything new. To be clear about my problem, I am
aware that I must re-enable routing after having disabled it. My issue
is that I expect all the UNASSIGNED shards to go back to the same
node, but some do not, only to get rebalanced back there later. I am
running elasticsearch 1.2.1.
I guess I'll ask about this once more for now. This happened again today. I
set allocation to new_primaries, restarted a node, set allocation back to
all, and the cluster is assigning across all nodes the shards that were on
the restarted node, and when it's done, which will probably take a day,
it'll likely rebalance by moving them back to the original node. I have to
assume I'm doing something wrong here. Am I?
It still remains a bit of black magic to me. Sometimes it works, sometimes
it does not.
Cheers,
Ivan
On Mon, Jul 28, 2014 at 1:52 PM, Andrew Davidoff davidoff@qedmf.net wrote:
I guess I'll ask about this once more for now. This happened again today.
I set allocation to new_primaries, restarted a node, set allocation back to
all, and the cluster is assigning across all nodes the shards that were on
the restarted node, and when it's done, which will probably take a day,
it'll likely rebalance by moving them back to the original node. I have to
assume I'm doing something wrong here. Am I?
It still remains a bit of black magic to me. Sometimes it works, sometimes
it does not.
Ivan,
I have read over that documentation several times and I don't understand
how it would help me. By which I mean I honestly don't understand how it
would help me - not that I am unwilling to try it. Those settings read like
they control when recovery would begin, but my problem isn't that recovery
is starting when I don't want it to, it's that when I start it (by setting
shard allocation about to "all") shards that I'd think would just stick
with the node they were previously on, get assigned to other nodes, then
ultimately get rebalanced back to the original node.
At this point I am finding that for quick restarts, just doing them with no
additional prep work allows me to recover in ~30m, vs ~24h. So for now I'm
just going to do that. Whatever I am doing wrong here just isn't at all
clear to me.
Thanks for your advice. If I have misunderstood the settings you pointed me
at and you think you can help me understand, I'd be grateful for more
information.
Andy
Cheers,
Ivan
On Mon, Jul 28, 2014 at 1:52 PM, Andrew Davidoff <davi...@qedmf.net
<javascript:>> wrote:
I guess I'll ask about this once more for now. This happened again today.
I set allocation to new_primaries, restarted a node, set allocation back to
all, and the cluster is assigning across all nodes the shards that were on
the restarted node, and when it's done, which will probably take a day,
it'll likely rebalance by moving them back to the original node. I have to
assume I'm doing something wrong here. Am I?
The idea is that the cluster should be delayed when a cluster rebalance
occurs, but even with these settings, I often find that shards are moved
immediately.
Are you using the default stores throttling settings? I found them to be
quite low.
Cheers,
Ivan
On Wed, Jul 30, 2014 at 6:02 AM, Andrew Davidoff davidoff@qedmf.net wrote:
On Tuesday, July 29, 2014 3:27:13 PM UTC-4, Ivan Brusic wrote:
It still remains a bit of black magic to me. Sometimes it works,
sometimes it does not.
Ivan,
I have read over that documentation several times and I don't understand
how it would help me. By which I mean I honestly don't understand how it
would help me - not that I am unwilling to try it. Those settings read like
they control when recovery would begin, but my problem isn't that recovery
is starting when I don't want it to, it's that when I start it (by setting
shard allocation about to "all") shards that I'd think would just stick
with the node they were previously on, get assigned to other nodes, then
ultimately get rebalanced back to the original node.
At this point I am finding that for quick restarts, just doing them with
no additional prep work allows me to recover in ~30m, vs ~24h. So for now
I'm just going to do that. Whatever I am doing wrong here just isn't at all
clear to me.
Thanks for your advice. If I have misunderstood the settings you pointed
me at and you think you can help me understand, I'd be grateful for more
information.
Andy
Cheers,
Ivan
On Mon, Jul 28, 2014 at 1:52 PM, Andrew Davidoff davi...@qedmf.net
wrote:
I guess I'll ask about this once more for now. This happened again
today. I set allocation to new_primaries, restarted a node, set allocation
back to all, and the cluster is assigning across all nodes the shards that
were on the restarted node, and when it's done, which will probably take a
day, it'll likely rebalance by moving them back to the original node. I
have to assume I'm doing something wrong here. Am I?
I've seen this as well Ivan, and have also had a few people on IRC comment
on the same thing - shards that are local are not simply being initialised,
but being reallocated elsewhere.
The idea is that the cluster should be delayed when a cluster rebalance
occurs, but even with these settings, I often find that shards are moved
immediately.
Are you using the default stores throttling settings? I found them to be
quite low.
Cheers,
Ivan
On Wed, Jul 30, 2014 at 6:02 AM, Andrew Davidoff davidoff@qedmf.net
wrote:
On Tuesday, July 29, 2014 3:27:13 PM UTC-4, Ivan Brusic wrote:
It still remains a bit of black magic to me. Sometimes it works,
sometimes it does not.
Ivan,
I have read over that documentation several times and I don't understand
how it would help me. By which I mean I honestly don't understand how it
would help me - not that I am unwilling to try it. Those settings read like
they control when recovery would begin, but my problem isn't that recovery
is starting when I don't want it to, it's that when I start it (by setting
shard allocation about to "all") shards that I'd think would just stick
with the node they were previously on, get assigned to other nodes, then
ultimately get rebalanced back to the original node.
At this point I am finding that for quick restarts, just doing them with
no additional prep work allows me to recover in ~30m, vs ~24h. So for now
I'm just going to do that. Whatever I am doing wrong here just isn't at all
clear to me.
Thanks for your advice. If I have misunderstood the settings you pointed
me at and you think you can help me understand, I'd be grateful for more
information.
Andy
Cheers,
Ivan
On Mon, Jul 28, 2014 at 1:52 PM, Andrew Davidoff davi...@qedmf.net
wrote:
I guess I'll ask about this once more for now. This happened again
today. I set allocation to new_primaries, restarted a node, set allocation
back to all, and the cluster is assigning across all nodes the shards that
were on the restarted node, and when it's done, which will probably take a
day, it'll likely rebalance by moving them back to the original node. I
have to assume I'm doing something wrong here. Am I?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.