Elasticsearch Cluster Sizing

Darshan_J · April 11, 2023, 4:36am

The webinar above showcases bunch of formulas for Elasticsearch cluster sizing. There are discussions where responses show unfamiliarity with the formulas or techniques given in the webinar. The discussion also follows saying these formulas have many assumptions behind it. So my question is

What are those assumptions, which is not specified in the webinar or presentation, behind the following formulas,
Volume sizing:

Throughput sizing:

I know the sizing depends on many factors and use cases and there is no fixed technique.. After looking at some discussions I'm starting to wonder the feasibility of these formulas. So I'm requesting some clarity on these formulas and also use cases that it would be helpful in.
My scenario is I want to size a cluster by deciding the number of nodes in it and resources like disk size CPU RAM etc allocated to those nodes. So Im using Volume Sizing to find the total disk space and number of nodes... and Throughtput sizing to find Thread pool size. Im confused as in which output(Total Data nodes) to consider for number of nodes, is it from Volume sizing or throughput sizing... both formulas has total Data nodes as output at the end.

Christian_Dahlqvist · April 11, 2023, 6:46am

Can you please describe your use case?

What type of data are you ingesting?

Darshan_J · April 11, 2023, 7:30am

Actually I'm trying to build a utility that considers several factors like Document size, document count per day, retention period, number of replicas, read rate, write rate etc... and give a output like how many number of nodes you need and amount of storage, RAM and CPU for those nodes. Thats when I came across those formulas and thought it was appropriate for what I'm doing until I came across that discussion which I have mentioned above which is creating some doubts for me about the formulas. talking about the use case Ideally this utility should work for any use case.. Is this even possible ?

Christian_Dahlqvist · April 11, 2023, 9:10am

This webinar and the given formulas assume a standard log and metrics use case where immutable data is ingested and queried a a reasonable level. They will give you a rough estimate of the cluster and node size needed. There are a lot of factors that affect the sizing, e.g. query rates, acceptable query latencies, storage performance, peak to average ingest ratios, data and quey complexity etc, so i always recommend testing with real data or running a realustic benchmark to get a more accurate estimate.

Search use cases are often sized very differently and these formulas would not apply.

Darshan_J · April 11, 2023, 9:57am

Ok thanks for your response. Is there any hands-on sessions or tutorials on how to do benchmarking and from that how to do sizing ?

system · May 9, 2023, 9:57am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Technicality on throughput-based cluster sizing Elasticsearch	3	435	December 2, 2021
Question about Throughput Sizing Elasticsearch	5	1389	October 3, 2019
How i can calculate the cluster nodes size like RAM, CPU and Disk Elasticsearch	10	3945	October 18, 2021
Minimum Elasticsearch Hardware Requirement Elasticsearch	2	388	January 6, 2021
Determine Cluster Size Needed Elasticsearch	4	712	June 28, 2017

Elasticsearch Cluster Sizing

Related topics