Elastic sizing

sesaravanan · November 13, 2017, 7:40am

Hi,

Architecture 1:

I had designed an elastic architecture for 1GB daily incoming data with 365 days
retention period.

3 - Master nodes with 8 CPU and 32GB RAM with 50 GB disk space per node.
8 - Data nodes with 8 CPU and 64GB RAM with 7.5 TB disk space per node.
3 - Client nodes with 8 CPU and 32GB RAM with 50 GB disk space per node
No of Primary shards - 1 per index
No of Replica shards - 2 per index
No of index : 10

Architecture 2:

This elastic architecture from other team and they designed for us with same parameters, 1GB daily incoming data with 365 days retention period.

5 nodes which includes both Master and Data nodes with 8 CPU and 64GB RAM with 3 TB disk space per node.

No of Primary shards - 2 per index
No of Replica shards - 2 per index
No of index : 10

I'm really confused , which architecture will be good.
Any suggestion will help me to proceed further.

Christian_Dahlqvist · November 13, 2017, 8:07am

It sounds like you plan to generate 20 primary shards and 40 replica shards per day. Over 365 days that gives you 21900 shards. That is a lot for a system only ingesting only 1GB of data per day. You can read this blog post about shards and sharding to learn why this is not a good idea.

If you need to maintain 10 separate indices and are not able to consolidate them, I would recommend that you instead use monthly indices, potentially with a single primary shard. This will give you a much more manageable 30 shards per month - 360 total shards over the year.

If we instead look at data volume, you are only indexing 365GB of raw data per year as far as I understand. If we make the simplified assumption this size stays the same when indexed to disk, the 2 replica shards will give you a total, estimated indexed volume of around 1.1TB. I would expect a cluster with 3 master/data nodes to handle this easily, so unless you have factored in a lot of growth, I would say both architectures may be oversized.

sesaravanan · November 13, 2017, 4:59pm

Thanks Christian_Dahlqvist. Data which is not going to be 1GB always, sometimes during outage we may receive max 1TB of data, how can I plan for that.

Thanks,
Saravanan

Christian_Dahlqvist · November 13, 2017, 7:37pm

I am not sure I understand your question. Could you please clarify?

sesaravanan · November 15, 2017, 7:22am

Normally we get 1GB data daily but during outage time we will get 1TB of data. During outage we will get huge alert from Data center.

Christian_Dahlqvist · November 15, 2017, 7:52am

That is a huge difference. Then you will probably need to size for the larger volume and require a larger cluster than I mentioned, but it is hard for me to say how large based on that information. I would recommend running some benchmarks to determine the correct size.

sesaravanan · November 16, 2017, 7:47am

Which architecture should i follow?, Also can you suggest me benchmark tools, which help me to figure out correct size.

Christian_Dahlqvist · November 16, 2017, 8:27am

If you get 1TB of data in a day, how is this distributed over time? How much lag ingesting this can you tolerate?

Determining this will give you the peak indexing rate that your cluster need to support. You can then benchmark what size cluster you need to sustain that for a period of time, e.g. using Rally. Make sure that you also include realistic levels of querying in your benchmark.

system · December 14, 2017, 8:27am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Advice on Elasticsearch Architecture design Elasticsearch	4	506	April 13, 2020
Sizing Architecture for Huge amount data Elasticsearch	2	906	April 13, 2017
Most right architecture for my cluster Elasticsearch	6	503	February 28, 2019
When do you need more then 1 shard? Elasticsearch	12	1853	July 6, 2017
Suggest for Heap size chnage and number of shards require Elasticsearch	6	3725	August 30, 2017

Elastic sizing

Related topics