Possible to run with EBS instead of instance store?

VBDave · August 23, 2019, 5:24pm

Hi there,

We're currently running an elastic cluster for an ELK backend, on i3.2xlarge EC2 instances in AWS and using the local Instance Store to store the data partition on, which is generally what is recommended in the official elastic documentation (https://www.elastic.co/guide/en/elasticsearch/plugins/master/cloud-aws-best-practices.html).

However, we're reaching a point in log volume where we're close to using up the 2TB instance store disks, which obviously can't be increased - and we're already using curator to archive off old logs etc so need to increase storage.

Rather than scaling instances horizontally we're investigating the possibility of using EBS volumes for the data store instead, which can obviously be scaled larger and (to an extent) have scalable performance, but are unlikely to provide "local SSD" performance of Instance Store drives. They can however be detached from a failed (e.g. hardware failure) instance and attached to a healthy one, whereas obviously instance store data is ephemeral.

Has anyone done this successfully, or got any data or benchmarks? For reference, our PROD ELK cluster has a log volume of ~150GB a day. The official elasticsearch documents get a bit vague on EBS volumes over Instance Stores, saying that it should be for "smaller" clusters and to "make sure you have enough IOPS".

I'd love to know if anyone has any experience with this. Do EBS volumes just get prohibitively expensive to get the IOPS required for a "larger" cluster?

Thanks for any help/thoughts/experience!

Christian_Dahlqvist · August 25, 2019, 5:30am

I would suggest setting up a hot/warm architecture where you index data into the i3.2xlarge instances and then move indices off to nodes backed by gp2 EBS once they are no longer written to after a few days. Large gp2 EBS volumes are considerably slower than local SSGs but get a decent amount of IOPS even without provisioned IOPS and should be able to handle querying quite well. Indexing is however very I/O intensive and probably best left to the current nodes (although you may get away with fewer than you have now).

martinr_ubi · September 5, 2019, 8:31am

Hi,

Have you considered that an i3en.2xlarge cost less than an r5.2xlarge+5TB@gp2?

An i3.4xlarge is also only slightly more costly than an r5.2xlarge+5TB@gp2 and you get more cpu and RAM for the difference.

Each use case is unique, I’m more enticing you to look at all the options with instance storage before considering EBS, which is usually not the best fit for software that manage durability via clustering/replication.
In the end only you can consider all factors and benchmark the impact of the storage backend change, of course.

I run the i3en.3xlarge in my cluster’s data nodes since they came out and I can confirm they can be a good fit for ES use cases.

Look at the d2 for warm/cold and i3 i3en for hot nodes, data nodes that is.

system · October 3, 2019, 8:32am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elasticsearch with instance store on k8 Elasticsearch	4	787	April 7, 2020
Which Storage type to be used for ELK on AWS? Elasticsearch	11	3838	April 8, 2019
AWS Disk Latency Issues Elasticsearch	1	768	July 5, 2017
ElasticSearch on Amazon EC2 tips Elasticsearch	4	1570	July 6, 2017
AWS EC2 based cluster best practices Elasticsearch	18	2526	August 13, 2020

Possible to run with EBS instead of instance store?

Related topics