When I create an index, shards and replicas are allocated evenly across my
nodes. But if one of my nodes dies, its shards get reallocated to other
nodes in the cluster. Then, when it comes back up it gets allocated some
shards again, BUT not necessarily the ones it had to start with. This means
if I restart a particular server, when it comes back up it's going to spend
x amount of time downloading the index from another node. If it was only
down for 30 seconds and no documents were indexed in that time its lucene
index would still be perfectly valid, but we seem to throw it away. It just
seems like a waste of time to transfer the index (potentially 10s of Gb?)
even if it has a valid copy of the index?
I know I can use the cluster.routing.allocation.disable_*allocation
settings to stop it doing any auto allocation, but then it does not do any
allocation for a new index. Which means when creating a new index, I would
have to do the allocation manually.
What I would like is a way of having elastic search do the allocation on
creation of a new index, but after that leave everything alone. Is this
possible, or am I just thinking about this in totally the wrong way?
Disabling allocation is my usual route, but as you stated, it might not
work for everybody. You can try excluding that server.
If you are not indexing during your planned restarts, I find it helpful to
flush the replication logs. Another option is to increase your timeout
during your planned outages.
When I create an index, shards and replicas are allocated evenly across my
nodes. But if one of my nodes dies, its shards get reallocated to other
nodes in the cluster. Then, when it comes back up it gets allocated some
shards again, BUT not necessarily the ones it had to start with. This means
if I restart a particular server, when it comes back up it's going to spend
x amount of time downloading the index from another node. If it was only
down for 30 seconds and no documents were indexed in that time its lucene
index would still be perfectly valid, but we seem to throw it away. It just
seems like a waste of time to transfer the index (potentially 10s of Gb?)
even if it has a valid copy of the index?
I know I can use the cluster.routing.allocation.disable_*allocation
settings to stop it doing any auto allocation, but then it does not do any
allocation for a new index. Which means when creating a new index, I would
have to do the allocation manually.
What I would like is a way of having Elasticsearch do the allocation on
creation of a new index, but after that leave everything alone. Is this
possible, or am I just thinking about this in totally the wrong way?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.