Moving from fs gateway type to cluster using S3/Cloudfiles


(darron) #1

We currently have a single Elasticsearch box running 0.17.6 - it's using
the fs gateway type.

We're really liking the tool and it's working great for our uses - now I am
stepping in to deploy a cluster of boxes using 0.18.4. My goal is to:

  1. Have a cluster of nodes running for speed and availability.
  2. Use the S3/Cloudfiles gateway to keep gateway persistent and more
    durable with VPS failures.

I have been reading the mailing list/guides/tutorials/etc and have come up
with this plan but have a couple questions.

First the proposed plan:

  1. Flush and shutdown the currently running 0.17.6.
  2. Update the binaries to 0.18.4.
  3. Start the new binaries and make sure they work.
  4. Add another node to the cluster.

Here are my questions:

  1. Because it's currently using the fs gateway, unless they all have access
    to that directory of files then the others will not be able to recover from
    failure properly. We should have used local gateway for that single box
    right?

  2. If I add the new nodes with the S3/Cloudfiles gateway, will they
    replicate all of the indexes properly to S3/Cloudfiles? Or do they only add
    the items when they're indexed? I have read a ton about people trying to do
    that, looks like they need to re-index so that the offsite gateways get
    populated - is that correct?

The clustering seems to be pretty magical as far as discovery goes - this
is the config that worked great with Rackspace Cloud
boxes: https://gist.github.com/1390228

  1. Do the boxes have to be on the same subnet to find each other?

I couldn't find any details about that and it all worked flawlessly, but
wondering how I'll add boxes in the future when they run out of IP
addresses that are "nearby". Or if that even matters.

Sorry if I've missed something, I have read everything I can find and I'm
just trying to make sure on my last few questions. I found a lot of
information that appeared to apply to old releases and just wanted to
clarify.

We're really loving the product and I spent much of this weekend adding and
removing nodes to my cluster, dropping and adding all sorts of indexes and
watching it rebalance - very nice work so far.


(Shay Banon) #2

To be honest, I am lost, here are some points:

  1. You say you use Rackspace, and it works, but the config points to AWS
    configuration?
  2. There is no rackspace cloudfiles support to act as gateway, only s3.
  3. In any case for 2, you should start with local gateway, its perfectly
    fine to use on one node, to many. I don't understand why you used fs
    gateway in your one node scenario now.
  4. Changing gateway implementation requires reindexing.

On Thu, Nov 24, 2011 at 1:36 AM, darron dfroese@gmail.com wrote:

We currently have a single Elasticsearch box running 0.17.6 - it's using
the fs gateway type.

We're really liking the tool and it's working great for our uses - now I
am stepping in to deploy a cluster of boxes using 0.18.4. My goal is to:

  1. Have a cluster of nodes running for speed and availability.
  2. Use the S3/Cloudfiles gateway to keep gateway persistent and more
    durable with VPS failures.

I have been reading the mailing list/guides/tutorials/etc and have come up
with this plan but have a couple questions.

First the proposed plan:

  1. Flush and shutdown the currently running 0.17.6.
  2. Update the binaries to 0.18.4.
  3. Start the new binaries and make sure they work.
  4. Add another node to the cluster.

Here are my questions:

  1. Because it's currently using the fs gateway, unless they all have
    access to that directory of files then the others will not be able to
    recover from failure properly. We should have used local gateway for that
    single box right?

  2. If I add the new nodes with the S3/Cloudfiles gateway, will they
    replicate all of the indexes properly to S3/Cloudfiles? Or do they only add
    the items when they're indexed? I have read a ton about people trying to do
    that, looks like they need to re-index so that the offsite gateways get
    populated - is that correct?

The clustering seems to be pretty magical as far as discovery goes - this
is the config that worked great with Rackspace Cloud boxes:
https://gist.github.com/1390228

  1. Do the boxes have to be on the same subnet to find each other?

I couldn't find any details about that and it all worked flawlessly, but
wondering how I'll add boxes in the future when they run out of IP
addresses that are "nearby". Or if that even matters.

Sorry if I've missed something, I have read everything I can find and I'm
just trying to make sure on my last few questions. I found a lot of
information that appeared to apply to old releases and just wanted to
clarify.

We're really loving the product and I spent much of this weekend adding
and removing nodes to my cluster, dropping and adding all sorts of indexes
and watching it rebalance - very nice work so far.


(Darron Froese) #3

On Thu, Nov 24, 2011 at 7:02 AM, Shay Banon kimchy@gmail.com wrote:

To be honest, I am lost, here are some points:

  1. You say you use Rackspace, and it works, but the config points to AWS
    configuration?

Sorry for being confusing - we're using a Rackspace cloud VPS with the
S3 gateway.

  1. There is no rackspace cloudfiles support to act as gateway, only s3.

No problem - I thought there was - was testing with S3 and will stick with that.

I must have gotten confused with the Gateway information here:

  1. In any case for 2, you should start with local gateway, its perfectly
    fine to use on one node, to many. I don't understand why you used fs gateway
    in your one node scenario now.

I figured local should have been used - I didn't actually set that box
up but am stepping in and setting up the cluster now.

  1. Changing gateway implementation requires reindexing.

No problem - we'll make that happen.

Thanks for the response - sorry for confusing it all.

It's an incredible tool - we're really happy to have found it and will
be using it for other projects going forward.


(Shay Banon) #4

No problem :), answers below

On Thu, Nov 24, 2011 at 8:13 PM, Darron Froese darron@nonfiction.ca wrote:

On Thu, Nov 24, 2011 at 7:02 AM, Shay Banon kimchy@gmail.com wrote:

To be honest, I am lost, here are some points:

  1. You say you use Rackspace, and it works, but the config points to AWS
    configuration?

Sorry for being confusing - we're using a Rackspace cloud VPS with the
S3 gateway.

I see, probably bad idea, the perf talking to S3 not within AWS is not
amazing.

  1. There is no rackspace cloudfiles support to act as gateway, only s3.

No problem - I thought there was - was testing with S3 and will stick with
that.

I suggest you use local gateway.

I must have gotten confused with the Gateway information here:

http://www.elasticsearch.org/blog/2010/05/11/here-comes-the-cloud.html

Its an "old" post, before local gateway was implemented... :slight_smile:

  1. In any case for 2, you should start with local gateway, its perfectly
    fine to use on one node, to many. I don't understand why you used fs
    gateway
    in your one node scenario now.

I figured local should have been used - I didn't actually set that box
up but am stepping in and setting up the cluster now.

  1. Changing gateway implementation requires reindexing.

No problem - we'll make that happen.

Just double checking that you will use local gateway.

Thanks for the response - sorry for confusing it all.

It's an incredible tool - we're really happy to have found it and will
be using it for other projects going forward.


(Darron Froese) #5

Hmm - I wanted to use S3 to have an offsite way to recover - in case
something happened with the cluster.

If we use the local gateway and backup the box every day, does that
have enough data to be able to recover data for the whole cluster?

Alternately, we could just setup some AWS boxes and use the S3 gateway

  • that should solve the performance issues.

On Thu, Nov 24, 2011 at 11:44 AM, Shay Banon kimchy@gmail.com wrote:

No problem :), answers below

On Thu, Nov 24, 2011 at 8:13 PM, Darron Froese darron@nonfiction.ca wrote:

On Thu, Nov 24, 2011 at 7:02 AM, Shay Banon kimchy@gmail.com wrote:

To be honest, I am lost, here are some points:

  1. You say you use Rackspace, and it works, but the config points to AWS
    configuration?

Sorry for being confusing - we're using a Rackspace cloud VPS with the
S3 gateway.

I see, probably bad idea, the perf talking to S3 not within AWS is not
amazing.

  1. There is no rackspace cloudfiles support to act as gateway, only s3.

No problem - I thought there was - was testing with S3 and will stick with
that.

I suggest you use local gateway.

I must have gotten confused with the Gateway information here:

http://www.elasticsearch.org/blog/2010/05/11/here-comes-the-cloud.html

Its an "old" post, before local gateway was implemented... :slight_smile:

  1. In any case for 2, you should start with local gateway, its perfectly
    fine to use on one node, to many. I don't understand why you used fs
    gateway
    in your one node scenario now.

I figured local should have been used - I didn't actually set that box
up but am stepping in and setting up the cluster now.

  1. Changing gateway implementation requires reindexing.

No problem - we'll make that happen.

Just double checking that you will use local gateway.

Thanks for the response - sorry for confusing it all.

It's an incredible tool - we're really happy to have found it and will
be using it for other projects going forward.


(Shay Banon) #6

You can use local gateway and backup the data locations.

On Thu, Nov 24, 2011 at 9:05 PM, Darron Froese darron@nonfiction.ca wrote:

Hmm - I wanted to use S3 to have an offsite way to recover - in case
something happened with the cluster.

If we use the local gateway and backup the box every day, does that
have enough data to be able to recover data for the whole cluster?

Alternately, we could just setup some AWS boxes and use the S3 gateway

  • that should solve the performance issues.

On Thu, Nov 24, 2011 at 11:44 AM, Shay Banon kimchy@gmail.com wrote:

No problem :), answers below

On Thu, Nov 24, 2011 at 8:13 PM, Darron Froese darron@nonfiction.ca
wrote:

On Thu, Nov 24, 2011 at 7:02 AM, Shay Banon kimchy@gmail.com wrote:

To be honest, I am lost, here are some points:

  1. You say you use Rackspace, and it works, but the config points to
    AWS

configuration?

Sorry for being confusing - we're using a Rackspace cloud VPS with the
S3 gateway.

I see, probably bad idea, the perf talking to S3 not within AWS is not
amazing.

  1. There is no rackspace cloudfiles support to act as gateway, only
    s3.

No problem - I thought there was - was testing with S3 and will stick
with

that.

I suggest you use local gateway.

I must have gotten confused with the Gateway information here:

http://www.elasticsearch.org/blog/2010/05/11/here-comes-the-cloud.html

Its an "old" post, before local gateway was implemented... :slight_smile:

  1. In any case for 2, you should start with local gateway, its
    perfectly

fine to use on one node, to many. I don't understand why you used fs
gateway
in your one node scenario now.

I figured local should have been used - I didn't actually set that box
up but am stepping in and setting up the cluster now.

  1. Changing gateway implementation requires reindexing.

No problem - we'll make that happen.

Just double checking that you will use local gateway.

Thanks for the response - sorry for confusing it all.

It's an incredible tool - we're really happy to have found it and will
be using it for other projects going forward.


(Darron Froese) #7

No problem - we'll do that then - thanks for your help.

On Thu, Nov 24, 2011 at 12:09 PM, Shay Banon kimchy@gmail.com wrote:

You can use local gateway and backup the data locations.

On Thu, Nov 24, 2011 at 9:05 PM, Darron Froese darron@nonfiction.ca wrote:

Hmm - I wanted to use S3 to have an offsite way to recover - in case
something happened with the cluster.

If we use the local gateway and backup the box every day, does that
have enough data to be able to recover data for the whole cluster?

Alternately, we could just setup some AWS boxes and use the S3 gateway

  • that should solve the performance issues.

On Thu, Nov 24, 2011 at 11:44 AM, Shay Banon kimchy@gmail.com wrote:

No problem :), answers below

On Thu, Nov 24, 2011 at 8:13 PM, Darron Froese darron@nonfiction.ca
wrote:

On Thu, Nov 24, 2011 at 7:02 AM, Shay Banon kimchy@gmail.com wrote:

To be honest, I am lost, here are some points:

  1. You say you use Rackspace, and it works, but the config points to
    AWS
    configuration?

Sorry for being confusing - we're using a Rackspace cloud VPS with the
S3 gateway.

I see, probably bad idea, the perf talking to S3 not within AWS is not
amazing.

  1. There is no rackspace cloudfiles support to act as gateway, only
    s3.

No problem - I thought there was - was testing with S3 and will stick
with
that.

I suggest you use local gateway.

I must have gotten confused with the Gateway information here:

http://www.elasticsearch.org/blog/2010/05/11/here-comes-the-cloud.html

Its an "old" post, before local gateway was implemented... :slight_smile:

  1. In any case for 2, you should start with local gateway, its
    perfectly
    fine to use on one node, to many. I don't understand why you used fs
    gateway
    in your one node scenario now.

I figured local should have been used - I didn't actually set that box
up but am stepping in and setting up the cluster now.

  1. Changing gateway implementation requires reindexing.

No problem - we'll make that happen.

Just double checking that you will use local gateway.

Thanks for the response - sorry for confusing it all.

It's an incredible tool - we're really happy to have found it and will
be using it for other projects going forward.


(system) #8