EC2 cluster storage question

Paul_Sanwald_2 · February 24, 2015, 3:27pm

More detail below, but the the crux of my question is: What's the best way
to spin up/down "on demand" an ES cluster on EC2 that uses ephermal local
storage? Essentially, I want to run the cluster during the week and spin
down over the weekend. Other than brute force snapshot/restore, is there
any more creative way to do this, like mirroring local storage to EBS or
similar?

Some more background:
We run multiple ES clusters on ec2 (we use opsworks for deployment
automation). We started out several years back using EBS because we didn't
know any better, and have switched over to using SSD based local storage.
The performance improvements have been unbelievable.

Obviously, using ephermal local storage comes at a cost: we use
replication, take frequent snapshots, and store all source data to mitigate
the risk of data loss. the other thing that local storage means is that our
cluster essentially needs to be up and running 24/7, which I think is a
fairly normal.

I'm investigating some ways to save on cost for a large-ish cluster, and
one of the things is that we don't need it to necessarily run 24/7;
specifically, we want to turn the cluster off over the weekend. That said,
restoring terabytes from snapshot doesn't seem like a very efficient way to
do this, so I want to consider options, and was hoping the community could
help me in identifying options that I am missing.

thanks in advance for any thoughts you may have.

--paul

--
Important Notice: The information contained in or attached to this email
message is confidential and proprietary information of RedOwl Analytics,
Inc., and by opening this email or any attachment the recipient agrees to
keep such information strictly confidential and not to use or disclose the
information other than as expressly authorized by RedOwl Analytics, Inc.
If you are not the intended recipient, please be aware that any use,
printing, copying, disclosure, dissemination, or the taking of any act in
reliance on this communication or the information contained herein is
strictly prohibited. If you think that you have received this email message
in error, please delete it and notify the sender.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/588e485c-6029-4ded-a3ce-a8dd01213510%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

warkolm · February 25, 2015, 5:58am

Why not just shut the cluster down, disable allocation first and then just
gracefully power things off?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEYi1X_D15Aq62TzhbTN8kWKDPGpsuoYP2e2RJta9N5_tu4_ZA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Norberto_Meijome · February 25, 2015, 7:14am

OP points out he is using ephemeral storage...hence shutdown will destroy
the data...but it can be rsynced to EBS as part of the shutdown
process...and then repeat in reverse when starting things up again...

Though I guess you could let ES take care of it by tagging nodes
accordingly and updating the index settings .....(hope it makes sense...)
On 25/02/2015 4:58 pm, "Mark Walkom" markwalkom@gmail.com wrote:

Why not just shut the cluster down, disable allocation first and then just
gracefully power things off?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X_D15Aq62TzhbTN8kWKDPGpsuoYP2e2RJta9N5_tu4_ZA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X_D15Aq62TzhbTN8kWKDPGpsuoYP2e2RJta9N5_tu4_ZA%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CACj2-4Jafq4Fqf2GOsdK5OCcmdk3AtW3B2%3DjJYHTgCjyUzOQWg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

warkolm · February 25, 2015, 10:30am

Fair point. The rsync option could work, but then why not just use EBS and
then shut the nodes down to save the rsync work?
Tagging nodes probably won't help in this instance.

Basically if you want to shut everything down you need to go through
recovery, and depending on how long that takes it may not be worth the
cost. This is something you need to test.

On 25 February 2015 at 18:14, Norberto Meijome numard@gmail.com wrote:

OP points out he is using ephemeral storage...hence shutdown will destroy
the data...but it can be rsynced to EBS as part of the shutdown
process...and then repeat in reverse when starting things up again...

Though I guess you could let ES take care of it by tagging nodes
accordingly and updating the index settings .....(hope it makes sense...)
On 25/02/2015 4:58 pm, "Mark Walkom" markwalkom@gmail.com wrote:

Why not just shut the cluster down, disable allocation first and then
just gracefully power things off?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X_D15Aq62TzhbTN8kWKDPGpsuoYP2e2RJta9N5_tu4_ZA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X_D15Aq62TzhbTN8kWKDPGpsuoYP2e2RJta9N5_tu4_ZA%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CACj2-4Jafq4Fqf2GOsdK5OCcmdk3AtW3B2%3DjJYHTgCjyUzOQWg%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CACj2-4Jafq4Fqf2GOsdK5OCcmdk3AtW3B2%3DjJYHTgCjyUzOQWg%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEYi1X-oqv%3DFHF3%3DoULiWy_rJBf4PSi3AjgbDE_BtBwLP9Xt_w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

crimondi · February 25, 2015, 1:23pm

The 'i' and 'h' series are attractive because of the disk performance. We
considered using them but it was just not feasible given the volatility of
ephemeral storage.

On Wed, Feb 25, 2015 at 5:30 AM, Mark Walkom markwalkom@gmail.com wrote:

Fair point. The rsync option could work, but then why not just use EBS and
then shut the nodes down to save the rsync work?
Tagging nodes probably won't help in this instance.

Basically if you want to shut everything down you need to go through
recovery, and depending on how long that takes it may not be worth the
cost. This is something you need to test.

On 25 February 2015 at 18:14, Norberto Meijome numard@gmail.com wrote:

OP points out he is using ephemeral storage...hence shutdown will destroy
the data...but it can be rsynced to EBS as part of the shutdown
process...and then repeat in reverse when starting things up again...

Though I guess you could let ES take care of it by tagging nodes
accordingly and updating the index settings .....(hope it makes sense...)
On 25/02/2015 4:58 pm, "Mark Walkom" markwalkom@gmail.com wrote:

Why not just shut the cluster down, disable allocation first and then
just gracefully power things off?

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X_D15Aq62TzhbTN8kWKDPGpsuoYP2e2RJta9N5_tu4_ZA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X_D15Aq62TzhbTN8kWKDPGpsuoYP2e2RJta9N5_tu4_ZA%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CACj2-4Jafq4Fqf2GOsdK5OCcmdk3AtW3B2%3DjJYHTgCjyUzOQWg%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CACj2-4Jafq4Fqf2GOsdK5OCcmdk3AtW3B2%3DjJYHTgCjyUzOQWg%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X-oqv%3DFHF3%3DoULiWy_rJBf4PSi3AjgbDE_BtBwLP9Xt_w%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X-oqv%3DFHF3%3DoULiWy_rJBf4PSi3AjgbDE_BtBwLP9Xt_w%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
Chris Rimondi | http://twitter.com/crimondi | securitygrit.com

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CA%2BqatLjGbssnPA-S%3D0dRQsWjmRyd2XdqzvSu6Tc3%3D42UGNYXog%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Paul_Sanwald_2 · February 25, 2015, 2:43pm

Thanks, the rsync to EBS is what I was rolling around in my head, but
wasn't sure if it was a dumb idea.

We used to use Elastic Block Store, but have gotten incredible performance
gains from moving to SSD local storage. The ES team doesn't recommend any
kind of NAS
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/hardware.html,
and they re-iterated in their recent webinar that they couldn't really
recommend EBS. This was exactly in line with our experience: it will work,
but performance is less predictable and certainly degraded from ephermal
storage.

Sounds like I have two options:
1 - shutdown and just restore from snapshot when we start back up.
2 - sync local storage to EBS when we shutdown, and the reverse when we
start up.

Not sure if the juice is going to be worth the squeeze for either of these
options, but I appreciate everyone's thoughts.

Thanks!

--paul

On Wednesday, February 25, 2015 at 2:15:01 AM UTC-5, Norberto Meijome wrote:

OP points out he is using ephemeral storage...hence shutdown will destroy
the data...but it can be rsynced to EBS as part of the shutdown
process...and then repeat in reverse when starting things up again...

Though I guess you could let ES take care of it by tagging nodes
accordingly and updating the index settings .....(hope it makes sense...)
On 25/02/2015 4:58 pm, "Mark Walkom" <markw...@gmail.com <javascript:>>
wrote:

Why not just shut the cluster down, disable allocation first and then
just gracefully power things off?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X_D15Aq62TzhbTN8kWKDPGpsuoYP2e2RJta9N5_tu4_ZA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X_D15Aq62TzhbTN8kWKDPGpsuoYP2e2RJta9N5_tu4_ZA%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Important Notice: The information contained in or attached to this email
message is confidential and proprietary information of RedOwl Analytics,
Inc., and by opening this email or any attachment the recipient agrees to
keep such information strictly confidential and not to use or disclose the
information other than as expressly authorized by RedOwl Analytics, Inc.
If you are not the intended recipient, please be aware that any use,
printing, copying, disclosure, dissemination, or the taking of any act in
reliance on this communication or the information contained herein is
strictly prohibited. If you think that you have received this email message
in error, please delete it and notify the sender.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8f970a6d-806a-4290-9cb8-1f54217a8ed8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Norberto_Meijome · February 25, 2015, 10:26pm

Yes, of course EBS all the time would help for storage, but it can't
compete with local ssd in speed.
On 25/02/2015 9:31 pm, "Mark Walkom" markwalkom@gmail.com wrote:

Fair point. The rsync option could work, but then why not just use EBS and
then shut the nodes down to save the rsync work?
Tagging nodes probably won't help in this instance.

Basically if you want to shut everything down you need to go through
recovery, and depending on how long that takes it may not be worth the
cost. This is something you need to test.

On 25 February 2015 at 18:14, Norberto Meijome numard@gmail.com wrote:

OP points out he is using ephemeral storage...hence shutdown will destroy
the data...but it can be rsynced to EBS as part of the shutdown
process...and then repeat in reverse when starting things up again...

Though I guess you could let ES take care of it by tagging nodes
accordingly and updating the index settings .....(hope it makes sense...)
On 25/02/2015 4:58 pm, "Mark Walkom" markwalkom@gmail.com wrote:

Why not just shut the cluster down, disable allocation first and then
just gracefully power things off?

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X_D15Aq62TzhbTN8kWKDPGpsuoYP2e2RJta9N5_tu4_ZA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X_D15Aq62TzhbTN8kWKDPGpsuoYP2e2RJta9N5_tu4_ZA%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CACj2-4Jafq4Fqf2GOsdK5OCcmdk3AtW3B2%3DjJYHTgCjyUzOQWg%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CACj2-4Jafq4Fqf2GOsdK5OCcmdk3AtW3B2%3DjJYHTgCjyUzOQWg%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X-oqv%3DFHF3%3DoULiWy_rJBf4PSi3AjgbDE_BtBwLP9Xt_w%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X-oqv%3DFHF3%3DoULiWy_rJBf4PSi3AjgbDE_BtBwLP9Xt_w%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CACj2-4K2q1h3roWtrMxAGfxZoUGCBZfq5RH52Mh_UPkxSEzTzg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Chris_Pall · February 25, 2015, 11:39pm

Supposedly doing RAID0 EBS volumes helps to mitigate some I/O issues. You
could go that route and avoid having to refresh, assuming the performance
would be acceptable.

On Tuesday, February 24, 2015 at 10:27:16 AM UTC-5, Paul Sanwald wrote:

More detail below, but the the crux of my question is: What's the best way
to spin up/down "on demand" an ES cluster on EC2 that uses ephermal local
storage? Essentially, I want to run the cluster during the week and spin
down over the weekend. Other than brute force snapshot/restore, is there
any more creative way to do this, like mirroring local storage to EBS or
similar?

Some more background:
We run multiple ES clusters on ec2 (we use opsworks for deployment
automation). We started out several years back using EBS because we didn't
know any better, and have switched over to using SSD based local storage.
The performance improvements have been unbelievable.

Obviously, using ephermal local storage comes at a cost: we use
replication, take frequent snapshots, and store all source data to mitigate
the risk of data loss. the other thing that local storage means is that our
cluster essentially needs to be up and running 24/7, which I think is a
fairly normal.

I'm investigating some ways to save on cost for a large-ish cluster, and
one of the things is that we don't need it to necessarily run 24/7;
specifically, we want to turn the cluster off over the weekend. That said,
restoring terabytes from snapshot doesn't seem like a very efficient way to
do this, so I want to consider options, and was hoping the community could
help me in identifying options that I am missing.

thanks in advance for any thoughts you may have.

--paul

Important Notice: The information contained in or attached to this
email message is confidential and proprietary information of RedOwl
Analytics, Inc., and by opening this email or any attachment the recipient
agrees to keep such information strictly confidential and not to use or
disclose the information other than as expressly authorized by RedOwl
Analytics, Inc. If you are not the intended recipient, please be aware
that any use, printing, copying, disclosure, dissemination, or the taking
of any act in reliance on this communication or the information contained
herein is strictly prohibited. If you think that you have received this
email message in error, please delete it and notify the sender.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6513247e-5c4e-4fd6-9486-ba8b2245c575%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
ElasticSearch on Amazon EC2 tips Elasticsearch	4	1570	July 6, 2017
Re: Cluster metadata (bulk indexing terabytes of time-based data) Elasticsearch	1	262	July 6, 2017
AWS EC2 based cluster best practices Elasticsearch	18	2525	August 13, 2020
Setting up an ES cluster on EC2 with non EC2 fallback? Elasticsearch	1	403	July 6, 2017
Replica path Elasticsearch	3	351	July 6, 2017

EC2 cluster storage question

Related topics