ElasticSearch is re-copying all of the shards it's a replica of on restart. Meaning that if I have a 10 node cluster, and I restart one node, the node will copy over all of the data since the last restart. Even though the restart lasted about 10 seconds.
Does ElasticSearch have any way to detect missing files, and just copy them over instead of the entire index? How would I restart a node without re-copying terabytes worth of information?
There isn't a simple way to restart a node without causing it to start
migrating shards.
But, using the new cluster level update settings API, we can allow for that.
Basically, we can have a setting that suspends allocations, you can set it
using the cluster level update settings API, and then, once the restart is
done, enable it... .
Elasticsearch is re-copying all of the shards it's a replica of on restart.
Meaning that if I have a 10 node cluster, and I restart one node, the node
will copy over all of the data since the last restart. Even though the
restart lasted about 10 seconds.
Does Elasticsearch have any way to detect missing files, and just copy them
over instead of the entire index? How would I restart a node without
re-copying terabytes worth of information?
Thanks. I just saw that you pushed the change. Thanks for that! At least I know that I will not cause needless network traffic for rolling restarts.
I have one simple question regarding shard replica transfer. I know that with Solr a slave checks all segment files in the directory and only copies segments that it does not have, or newly merged segments. Basically a 'rsync' so it's as efficient as possible.
Does ElasticSearch has any of this type of logic when copying shards to replicas? I ask because we have on order of 40TB of indexes w/replicas that is constantly being updated in real time. I do need to perform rolling restarts for upgrades to cluster members.
Yes, elasticsearch will reuse same index files if possible when it does
allocation of replicas. Note, replication is done quite differently though,
see more about it in this session:
Thanks. I just saw that you pushed the change. Thanks for that! At least I
know that I will not cause needless network traffic for rolling restarts.
I have one simple question regarding shard replica transfer. I know that
with Solr a slave checks all segment files in the directory and only copies
segments that it does not have, or newly merged segments. Basically a
'rsync' so it's as efficient as possible.
Does Elasticsearch has any of this type of logic when copying shards to
replicas? I ask because we have on order of 40TB of indexes w/replicas that
is constantly being updated in real time. I do need to perform rolling
restarts for upgrades to cluster members.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.