Memory/cpu ratio to disk size

yuecong · September 23, 2019, 8:27pm

Could I get some comments on concerns/ insights on the following resource( cpu/ memory and disk size) configuration for one of my Elasticsearch cluster?

Data volume:

throughput: 18K docs/ second ( very continous load)
size: 720Gb per day.

index setting:
replica: 1
shard: 18 shards

node configurs:

3 cordinating nodes

for each node: 8 cpus, 32 GB memory, 16GB java heap.

3 master nodes:

for each node: 1 cpu, 8GB memory, 4Gb java heap, 50GB ssd disk

6 data nodes:

for each node: 20 cpus, 100 GB memory, 32GB java heap, 10TB data disks

Thanks!

Christian_Dahlqvist · September 24, 2019, 4:40am

I have a few questions:

How large and complex are your documents?
What is your retention period?
How will you query the data? How frequently? What are the query latency requirements?
Are you using the latest version?
What type of storage will your data nodes use?

yuecong · September 24, 2019, 5:01am

Thanks!

1, each document is a piece of log, like lo4j logs and nginx logs. The size for each document is from 1000 bytes to 2000 bytes.
2, I am setting the retention period as 30 days
3, We are using kibana to query the logs. The query latency requirement is not that strict. like less than 1 minute for a complicated query, but several seconds for normal queries. Besides, I have a job to periodically query the last doc to calculate some latency between the timestamp in the doc and the time I am indexing the doc and some _cat api to get the current state of the cluster per 30 seconds.
4, we are using 7.1 version. btw, I think upgrading from 7.1 to 7.x should not be as hard as upgrading from 6.x to 7.x, right?
5, we are using ssd type of EBS. (e.g. io1 for AWS)

yuecong · September 25, 2019, 11:00pm

@Christian_Dahlqvist could you help give some insights when you have time. Thanks

Christian_Dahlqvist · September 26, 2019, 4:29am

I would recommend watching the following videos:

If we make the simplified assumption that your data will take up the same size on disk as the raw size and that you will have a replica for high availability you will generate 1.44TB indices per day. that will be around 7TB of data per node. As the nodes will be handling a lot of indexing as well as querying I would not be surprised to see some heap pressure before you reach that volume. I would therefore suspect you might need a larger cluster in terms of data nodes, but the only way to know for sure is to test.

yuecong · September 26, 2019, 4:57am

Thanks so much for the guidance.

Christian_Dahlqvist · September 26, 2019, 4:58am

Also make sure you read this blog post about sharding practices.

system · October 24, 2019, 4:58am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Doubts about hardware requirements for elasticsearch Elasticsearch	11	1458	July 3, 2020
Data node high CPU Elasticsearch	19	3642	February 26, 2018
Configuration of elasticsearch in production environment Elasticsearch	2	1218	November 24, 2016
ES node disk sizing Elasticsearch	3	1417	May 8, 2019
Few Queries regarding Producion Cluster Configuration Elasticsearch	4	398	March 27, 2017

Memory/cpu ratio to disk size

Related topics