Need Help with Hardware configuration for EK-Stack for 2.5 TB of data per day

es_noobb · December 20, 2017, 10:29am

Hi everyone, this is my first post here. I am planning to deploy a cluster which will get around 2.5 TB of data per day. These logs will be parsed locally on a few logstash instances and will be sent to AWS which will host the Elasticsearch instances. Right now, I have thought of the below configuration:

https://calculator.s3.amazonaws.com/index.html#r=BOM&key=calc-200ABCF4-974C-4924-B3CE-E935B640D496

Its basically, 3xMaster, 4xData node of c4 and c2 xlarge, with 19 times 16.3 TB of SSD. We will be using S3 since the log retention period is 3 months. Our plan is to store one month Log in the SSD for evaluation and 2 months of raw Logs which comes to approx 80TB in S3 buckets, so that we can index them as and when we need using lambda. we will be using curator to delete older data.

I need to understand whether the above Hardware configuration would be fine? If not, what would be the requirement as for CPU and Memory.

warkolm · December 20, 2017, 10:31am

That's quite a lot for not many nodes. What testing have you done?

es_noobb · December 20, 2017, 11:19am

Hi warkolm,
there was a small error. Its 1.5 TB of logs per day. The log size will increase to 2.5 TB within a month. Thus, I am starting of with 4 Data nodes mentioned above, which we may have to increase. I have normally tested out on smaller nodes but we will be moving to production soon. We will be having an EPS count of 60000 approx. The Kibana will send requests for corelation of around 200-250 requests per day.

So, what should be the recommended configuration for such log size and such EPS count?

Christian_Dahlqvist · December 20, 2017, 4:41pm

The amount of data a single onde can handle will often in the end be limited by how much heap it has available as each shard comes with some overhead in terms of memory usage. 4 data ondes sounds very little for that amount of data and retention period, so I would recommend running a benchmark to find out how much data one of your ondes can take while still having enough heap available to serve queries. This process is discussed in this Elastic{ON} talk.

salamandar_joseph · December 21, 2017, 5:32am

we are using a ES as a search engine without kibana where our queries are around 400-500 per day with normal API calls from a website. We have a sizing of around 1 TB per day and 2 months retention period. When we started benchmarking the cluster, we used the default shard configuration and we are still using the same. Finally our cluster size is 3 master nodes, 12 data notes each with 64 GB of ram, and 8 core processor. We started with normal spinning disks with around 10k rpm, then 15k, but they were all very sloppy. We had bad response time. We then moved to EC2 GP2 SSDs, and is currently working well. We tried scaling the nodes, but its the worst possible thing we did, and I would never recommed that. Instead, we have a dedicated 2 person team who consistently monitor the health, and we increase the node as and when our data changes, and then move the data from new node manually and delte em safely later.

warkolm · December 21, 2017, 5:37am

Elasticsearch doesn't need full time attendance like this, it sounds like you have a few problems that might be solved via other more efficient means?

es_noobb · December 21, 2017, 5:41am

Hi Joesph,
With scaling did you mean EC2 scaling? If yes, what issues did you face doing that?

salamandar_joseph · December 21, 2017, 5:43am

we don't have any issue with the health of ES. it is normally green with going to yellow state like 2-3 times a month. wehn i said monitoring team, it mean't we dont use autoscaling for handling es. we do it manually. and what other efficient means you meant can you explain it?

warkolm · December 21, 2017, 5:44am

Why do you need two people do to this?

salamandar_joseph · December 21, 2017, 5:48am

they monitor not only es health, but management of clusters too. we dont use support from our vendor

salamandar_joseph · December 21, 2017, 5:52am

we lost data with scaling, and our cluster shot straight to red state since it was unable to find shards which were in the scaled node

es_noobb · December 21, 2017, 5:54am

Hi Warkolm,
What is your experience on auto scaling of nodes under pressure in AWS? Any ideas?

warkolm · December 21, 2017, 5:55am

It's not something I have done, I don't run clusters these days and haven't run production level ones for a while.

But we have users doing this and it works fine.

es_noobb · December 21, 2017, 6:00am

Hi warkolm,
Any ideas on how do you downsize a node once the peak pressure is gone when autoscaling?

warkolm · December 21, 2017, 6:13am

I wouldn't, I would horizontally scale. Both up and down.

es_noobb · December 21, 2017, 6:15am

yes, what i meant was how do you take down a node after the peak pressure is gone. Since the new node will contain shards and indexes too

Christian_Dahlqvist · December 21, 2017, 9:01am

Using auto scaling when nodes hold a lot of data is generally a bad idea as this will result in a lot of data being transferred, adding extra load, at exactly the wrong time. I have seen it used, but generally for search use cases with small data volumes and high query rates where all nodes hold all data.

es_noobb · December 21, 2017, 9:51am

Thanks Christian.

gerilya · December 26, 2017, 12:10pm

It looks like you have in total 60TB of gp2 EBS storage and 12x R4.2xlarge instances for data nodes.
You could immediately achieve a more than $1200/mo saving by moving to I3.2xlarge instances, which have the same amount of memory and CPUs, but also include fast NVMe-based ephemeral storage.

If you subtract the price of R4.2xlarge from I3.2xlarge, you will essentially get a price of 1.9TB of storage which will be almost 2 times cheaper than EBS GP2.
Additionally, you should gain some performance improvement (just make sure to use ENA-enabled AMIs).
Just my 2 cents...

system · January 23, 2018, 12:10pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Hardware configuration - tips Elasticsearch	10	1299	July 5, 2017
Hardware Recommendation Elasticsearch	13	89695	December 31, 2016
Hardware Sizing for ELK stack Elasticsearch	3	8817	July 5, 2017
Hardware requirement ELK Elasticsearch	4	6827	October 23, 2019
Hardware for ELK Elasticsearch	8	487	May 7, 2018

Need Help with Hardware configuration for EK-Stack for 2.5 TB of data per day

Related topics