Hello,
I have a question about sizing an Elasticsearch cluster based on throughput.
I'm following this webinar: Elasticsearch sizing and capacity planning.
If I understood correctly, the throughput-based sizing methodology relies on measuring the search speed on the hardware that is planned to host the data nodes. This methodology allows one to determine the number of CPU cores that are needed in total in the cluster to process the expected peak search throughput.
The webinar gives an approximate formula to compute the number of data nodes that are required.
This formula needs these inputs:
- peak searches per second
- the average response time (measured on the given hardware)
- the thread pool size for search operations, which is given by the data node configuration, by default equal to
(number_of_allocated_processors * 3/2) + 1
(also seen in Thread pool size)
The formula is the following:
num_data_node = INT( peak_threads / thread_pool_size ) + 1
where peak_threads
, that is the maximum number of threads that can be active at all times for search ops, is computed as follows:
peak_threads = INT( peak_search_rate * resp_time ) + 1
where:
- peak_search_rate is the maximum expected rate of search operations (i.e. num search operations per second)
- resp_time is the average response time for a search operation
Now, we are trying to design an Elasticsearch cluster that is used heavily both for searching and writing data.
The particular use case requires the cluster to process a very large number of search and indexing operations.
Though, indexing operation rate is almost 3 order of magnitude higher than the search ops rate.
To put it into numbers, we expect this situation:
- (peak) search ops rate: ~10 op/sec
- (peak) indexing ops rate: ~800 op/sec
I know that in general an indexing operation should take less than a search operation, but the rate is much larger.
So my question is, how can I take into account the load generated by the indexing operations in the sizing methodology?
Can I use the same formula that I wrote above, but for indexing operations instead of searches?
How can I combine the result that I get from the search throughput sizing with the one from the indexing throughput sizing?
I would be very happy if someone could point me to some official documentation, blogs, or any other material produced by Elastic that deals with this argument.
Thanks.