we're getting close to a production release with elastic search as one of
the components in our system. From time-to-time we will want to update ES
and also possibly redistribute (change the sharding / replication) the
cluster. We think the safest way to do this is probably to start another
cluster and migrate the data over. What do you think the best (quickest?)
option would be:
read from the gateway (is this possible for redistribution?)
pull the data from the existing cluster into the new cluster using
a) elastic search River?
b) custom script / app to pull data across
If the change requires reindexing, then you need to get the docs and index them. The best way is to just get them from the original place where they are stored, or, in upcoming 0.16, use the scan API.
If your change does not require reindexing (no mappings were changed / no need to change the number of shards), then you can simply copy over the gateway data (if you using shared gateway), and copy over to each node the "data" directory of existing nodes (this also applied when using the "local" gateway).
Make sense?
-shay.banon
On Monday, February 28, 2011 at 2:23 PM, Paul Loy wrote:
Hi Shay,
we're getting close to a production release with Elasticsearch as one of the components in our system. From time-to-time we will want to update ES and also possibly redistribute (change the sharding / replication) the cluster. We think the safest way to do this is probably to start another cluster and migrate the data over. What do you think the best (quickest?) option would be:
read from the gateway (is this possible for redistribution?)
pull the data from the existing cluster into the new cluster using
a) Elasticsearch River?
b) custom script / app to pull data across
Hi,
I just tried a similar technique on 0.14.2, but it doesn't appear to
be working as I'd expect. I've tried this a couple of times and have
gotten exact same results.
Have a three node cluster with local gateway
Updated replica count to 2, so that every machine will have all the
shards
Stopped a node, which should now have a full copy of the index data
I then try to rename the cluster on this node by renaming the data
directory and updating the elasticsearch.yml file to a new cluster
name. I also tweak the config file to only expect one node
I start things up and it appears that a few empty indexes get picked
up correctly, but nothing with data loads. The new cluster just sits
there in a red state.
If the change requires reindexing, then you need to get the docs and index them. The best way is to just get them from the original place where they are stored, or, in upcoming 0.16, use the scan API.
If your change does not require reindexing (no mappings were changed / no need to change the number of shards), then you can simply copy over the gateway data (if you using shared gateway), and copy over to each node the "data" directory of existing nodes (this also applied when using the "local" gateway).
Make sense?
-shay.banon
On Monday, February 28, 2011 at 2:23 PM, Paul Loy wrote:
Hi Shay,
we're getting close to a production release with Elasticsearch as one of the components in our system. From time-to-time we will want to update ES and also possibly redistribute (change the sharding / replication) the cluster. We think the safest way to do this is probably to start another cluster and migrate the data over. What do you think the best (quickest?) option would be:
read from the gateway (is this possible for redistribution?)
pull the data from the existing cluster into the new cluster using
a) Elasticsearch River?
b) custom script / app to pull data across
I then try to rename the cluster on this node by renaming the data
directory and updating the elasticsearch.yml file to a new cluster
name. I also tweak the config file to only expect one node
I didn't see any indication that there is any cross communication
between old and new clusters, but tried changing the port and am
getting the same behavior.
I then try to rename the cluster on this node by renaming the data
directory and updating the elasticsearch.yml file to a new cluster
name. I also tweak the config file to only expect one node
No data will be loaded because with 2 replicas, the cluster will recover an index once 2 instances of a shard are found within the cluster. You don't have to increase the number of replicas, you can simply copy over the 3 nodes data directory to the new 3 machine, rename the cluster name (and rename it under hte data dir) and it should work.
On Tuesday, March 1, 2011 at 7:33 PM, Paul wrote:
Thanks Clinton.
I didn't see any indication that there is any cross communication
between old and new clusters, but tried changing the port and am
getting the same behavior.
I then try to rename the cluster on this node by renaming the data
directory and updating the elasticsearch.yml file to a new cluster
name. I also tweak the config file to only expect one node
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.