Technicality on throughput-based cluster sizing


I have a question about sizing an Elasticsearch cluster based on throughput.

I'm following this webinar: Elasticsearch sizing and capacity planning.

If I understood correctly, the throughput-based sizing methodology relies on measuring the search speed on the hardware that is planned to host the data nodes. This methodology allows one to determine the number of CPU cores that are needed in total in the cluster to process the expected peak search throughput.

The webinar gives an approximate formula to compute the number of data nodes that are required.

This formula needs these inputs:

  • peak searches per second
  • the average response time (measured on the given hardware)
  • the thread pool size for search operations, which is given by the data node configuration, by default equal to (number_of_allocated_processors * 3/2) + 1 (also seen in Thread pool size)

The formula is the following:

num_data_node = INT( peak_threads / thread_pool_size ) + 1

where peak_threads, that is the maximum number of threads that can be active at all times for search ops, is computed as follows:

peak_threads = INT( peak_search_rate * resp_time ) + 1


  • peak_search_rate is the maximum expected rate of search operations (i.e. num search operations per second)
  • resp_time is the average response time for a search operation

Now, we are trying to design an Elasticsearch cluster that is used heavily both for searching and writing data.
The particular use case requires the cluster to process a very large number of search and indexing operations.
Though, indexing operation rate is almost 3 order of magnitude higher than the search ops rate.
To put it into numbers, we expect this situation:

  • (peak) search ops rate: ~10 op/sec
  • (peak) indexing ops rate: ~800 op/sec

I know that in general an indexing operation should take less than a search operation, but the rate is much larger.

So my question is, how can I take into account the load generated by the indexing operations in the sizing methodology?
Can I use the same formula that I wrote above, but for indexing operations instead of searches?
How can I combine the result that I get from the search throughput sizing with the one from the indexing throughput sizing?

I would be very happy if someone could point me to some official documentation, blogs, or any other material produced by Elastic that deals with this argument.


Indexing and querying in Elasticsearch can be very I/O intensive, so it is not a given that throughput will be limted by CPU. It could just as well be disk I/O, depending on how much data you index and query.

In my experience there is no way to analytically determine this for mixed loads so I would recommend you run some benchmarks with realistic data, volumes and load to see exactly how your data and queries behave and how this relates to your latency requirements.

Hi @Christian_Dahlqvist,

thanks for your reply.

What I was looking for is not an exact formula, but an approximate method to determine the size of the cluster.
I know that running tests with realistic data is the only proper way to actually see what is the performance of the hardware, but I wanted to first have a sketch of the amount of required resources, expecially for cases in which one has to ask for such resources a long time in advance.

But let me rephrase the problem.
Suppose I have a cluster composed of just 1 node (think of it as my test environment) and I'm able to run benchmark tests in order to measure the average response times for search and indexing operations, given a fixed throughput (rate of processed requests).
Now, my production cluster would be made of identical instances of this node (same hardware).
My question is, is there a way to determine how many of these nodes I need in order for the cluster to process the expected throughtput?

My reasoning would be something like this for search operations.
I know that a single node can process a maximum rate R_0 of search requests with an average response time t_search .
Ideally, if I wanted the production cluster to sustain a maximum rate R of search requests ensuring that the average response time t_search be the same (or lower), I would need a number of data nodes approximately equal to:

n = INT( R / R_0 ) + 1

and I would configure each index to have a number of replicas equal to n-1, so that each data node has 1 copy of each shard.

Do you think this reasoning is valid?

Now, in my case, the cluster would be subjected also to heavy load due to indexing operations.
So I need a way to combine both the effects of search and indexing operations.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.