Upgrading Elasticsearch from 19.8 ->20.1, S3 Gateway -> Local

We are looking to upgrade our ES instances to 20.1 from 19.8. We have
about 7 aws nodes in a cluster, with about 25 Million docs indexed. I see
that the Shared gateway has been deprecated in favor of the local, is there
a way to migrate over the gateway? From what I can tell, the only
alternative is to boot up another cluster which uses a local gateway and
copy over each of the documents, is there an easier way to do so?

--

You can migrate the data using Elasticsearch itself, some guidelines from
earlier reply
[https://groups.google.com/forum/#!msg/elasticsearch/sWp9XDzNmk8/YFRVSFtWFE4J]:

  • Create a two new IOPS EBS volumes [1], with enough space to hold your data
  • Launch a new EC2 instance with proper security groups
  • Mount the EBS on the new instance [2], to a good location such as
    /usr/local/var/elasticsearch/data1
  • Install and configure elasticsearch on the machine, using the same
    cluster name as your original cluster, using a local gateway, pointed to
    the location where you mounted the EBS volume
  • Launch elasticsearch on these new instances
  • Increase the number_of_replicas for your indices to four (ie. equal to
    the number of nodes). Your data will be now spread across all the nodes:
    the old ones, and the new ones.
  • Use Paramedic, BigDesk or Head elasticsearch plugins to monitor cluster
    health: once you're in a "green" health, and all shards are allocated, you
    can shutdown the old, S3-based nodes
  • You have migrated all data to a new cluster. The best practice now would
    be to do a snapshot of your EBS volumes, so you have a recovery strategy.
    You can delete the S3 buckets after doing that.

This strategy allows you to scale when the volume of your data grows and
the computing capacity of your cluster is enough: you can create a new set
of EBS volumes, mount them to a location such as
/usr/local/var/elasticsearch/data2 and point elasticsearch data.path to
both locations (it is possible to use multiple directories as the
data.path).

I'm currently working on having EBS support like that in the Chef cookbook
[https://github.com/karmi/cookbook-elasticsearch/], so it should me easier
to quickly whip up clusters on AWS in the near future.

Karel

On Monday, December 10, 2012 8:51:18 PM UTC+1, Neil Moonka wrote:

We are looking to upgrade our ES instances to 20.1 from 19.8. We have
about 7 aws nodes in a cluster, with about 25 Million docs indexed. I see
that the Shared gateway has been deprecated in favor of the local, is there
a way to migrate over the gateway? From what I can tell, the only
alternative is to boot up another cluster which uses a local gateway and
copy over each of the documents, is there an easier way to do so?

--

Thanks! This essentially worked, though I did a modification of it. We
had added a couple of extra nodes recently to support some upcoming
activities, so we are operating with a little breathing room. We turned
off a box, pointed it to local, launched it, and then let it replicate
(upping the number_of_replicas). We then worked our way through the
cluster bouncing them and pointing to local in a rolling restart fashion.
This let us avoid any downtime, thanks!

-Neil

On Tuesday, December 11, 2012 1:19:46 AM UTC-8, Karel Minařík wrote:

You can migrate the data using Elasticsearch itself, some guidelines from
earlier reply [
https://groups.google.com/forum/#!msg/elasticsearch/sWp9XDzNmk8/YFRVSFtWFE4J
]:

  • Create a two new IOPS EBS volumes [1], with enough space to hold your
    data
  • Launch a new EC2 instance with proper security groups
  • Mount the EBS on the new instance [2], to a good location such as
    /usr/local/var/elasticsearch/data1
  • Install and configure elasticsearch on the machine, using the same
    cluster name as your original cluster, using a local gateway, pointed to
    the location where you mounted the EBS volume
  • Launch elasticsearch on these new instances
  • Increase the number_of_replicas for your indices to four (ie. equal to
    the number of nodes). Your data will be now spread across all the nodes:
    the old ones, and the new ones.
  • Use Paramedic, BigDesk or Head elasticsearch plugins to monitor cluster
    health: once you're in a "green" health, and all shards are allocated, you
    can shutdown the old, S3-based nodes
  • You have migrated all data to a new cluster. The best practice now would
    be to do a snapshot of your EBS volumes, so you have a recovery strategy.
    You can delete the S3 buckets after doing that.

This strategy allows you to scale when the volume of your data grows and
the computing capacity of your cluster is enough: you can create a new set
of EBS volumes, mount them to a location such as
/usr/local/var/elasticsearch/data2 and point elasticsearch data.path to
both locations (it is possible to use multiple directories as the
data.path).

I'm currently working on having EBS support like that in the Chef cookbook
[https://github.com/karmi/cookbook-elasticsearch/], so it should me
easier to quickly whip up clusters on AWS in the near future.

Karel

On Monday, December 10, 2012 8:51:18 PM UTC+1, Neil Moonka wrote:

We are looking to upgrade our ES instances to 20.1 from 19.8. We have
about 7 aws nodes in a cluster, with about 25 Million docs indexed. I see
that the Shared gateway has been deprecated in favor of the local, is there
a way to migrate over the gateway? From what I can tell, the only
alternative is to boot up another cluster which uses a local gateway and
copy over each of the documents, is there an easier way to do so?

--

Hello,

yes, that sounds like a good approach. Note that maybe you could use the auto_expand_replicas setting [https://github.com/elasticsearch/elasticsearch/issues/623, http://www.elasticsearch.org/guide/reference/api/admin-indices-update-settings.html] to automatically increase replicas when a node is added, and decrease them if node is shut down, minimizing all the administration around the workflow a bit.

Karel

On Dec 12, 2012, at 2:14 AM, Neil Moonka nkmoonka@gmail.com wrote:

Thanks! This essentially worked, though I did a modification of it. We had added a couple of extra nodes recently to support some upcoming activities, so we are operating with a little breathing room. We turned off a box, pointed it to local, launched it, and then let it replicate (upping the number_of_replicas). We then worked our way through the cluster bouncing them and pointing to local in a rolling restart fashion. This let us avoid any downtime, thanks!

-Neil

On Tuesday, December 11, 2012 1:19:46 AM UTC-8, Karel Minařík wrote:
You can migrate the data using Elasticsearch itself, some guidelines from earlier reply [https://groups.google.com/forum/#!msg/elasticsearch/sWp9XDzNmk8/YFRVSFtWFE4J]:

  • Create a two new IOPS EBS volumes [1], with enough space to hold your data
  • Launch a new EC2 instance with proper security groups
  • Mount the EBS on the new instance [2], to a good location such as /usr/local/var/elasticsearch/data1
  • Install and configure elasticsearch on the machine, using the same cluster name as your original cluster, using a local gateway, pointed to the location where you mounted the EBS volume
  • Launch elasticsearch on these new instances
  • Increase the number_of_replicas for your indices to four (ie. equal to the number of nodes). Your data will be now spread across all the nodes: the old ones, and the new ones.
  • Use Paramedic, BigDesk or Head elasticsearch plugins to monitor cluster health: once you're in a "green" health, and all shards are allocated, you can shutdown the old, S3-based nodes
  • You have migrated all data to a new cluster. The best practice now would be to do a snapshot of your EBS volumes, so you have a recovery strategy. You can delete the S3 buckets after doing that.

This strategy allows you to scale when the volume of your data grows and the computing capacity of your cluster is enough: you can create a new set of EBS volumes, mount them to a location such as /usr/local/var/elasticsearch/data2 and point elasticsearch data.path to both locations (it is possible to use multiple directories as the data.path).

I'm currently working on having EBS support like that in the Chef cookbook [https://github.com/karmi/cookbook-elasticsearch/], so it should me easier to quickly whip up clusters on AWS in the near future.

Karel

On Monday, December 10, 2012 8:51:18 PM UTC+1, Neil Moonka wrote:
We are looking to upgrade our ES instances to 20.1 from 19.8. We have about 7 aws nodes in a cluster, with about 25 Million docs indexed. I see that the Shared gateway has been deprecated in favor of the local, is there a way to migrate over the gateway? From what I can tell, the only alternative is to boot up another cluster which uses a local gateway and copy over each of the documents, is there an easier way to do so?

--

--