Benchmarking es-cluster on AWS

amitsa · June 20, 2022, 11:01am

Hi All,

I am running my es-cluster on kubernetes on AWS instance x2iedn(memory optimized) and i4i(Storage optimized).
I tried benchmarking the es cluster on both instance type.

I have 3 data nodes and 1 master node.
each have a heap size 31 gb.

Resources of container is requested as below.

resources:
            requests:
              cpu: 8
              memory: "64Gi"

I am using nyc_taxis track to benchmark cluster.
I see the throughput for index follows as
32x instance < 16x instance < 8x instance > 4x instance

i.e suppose

4x ---> 4000 doc/s
8x ----> 4300 docs/sec
16x ---> 3900 doc/sec
32x ---> 3800 doc/s

these are just sample not exact throughput.

I have shards: 20
replicas: 0
bulksize: 10000
client : 8
refresh_interval: -1

The throughput should increase as we go for higher capacity instance.

But i see it starts reducing from 16x instance and reduces further for 32x instance(less than 4x instance).

Kindly help me with this.

Christian_Dahlqvist · June 20, 2022, 11:14am

When it comes to indexing performance, the amount of memory available is less important than the performance of the storage. The instance type you are using seem better suited to high concurrent query loads as all data might be able to fit in the operating system page cache. What type of storage are you using with the instance?

You should also note that the standard tracks are not necessarily set up to load very powerful nodes by default and you may very well need to tweak settings and concurrentcy in order to make Rally generate enough load to saturate the cluster, maybe even run multiple Rally instances.

I am also curious what you are looking to get out of this exercise. Does the expected workload for the cluster at all resemble the track you are using? I always recommend creating a custom track that as realistically as possible represents the data and load you are expecting in the cluster and then use this to get an as accurate estimate as possible.

I would also recommend the following resources:

amitsa · June 20, 2022, 11:24am

Hi @Christian_Dahlqvist

Thankyou for the response.

I am just looking to achieve the maximum indexing throughput by utilizing the available resources to the maximum.

I am just looking into any sample data which i can use to saturate my cluster to get maximum throughput.

Also my allocated resources gets used to maximum so that i can gain maximum throuhput index.

Kindly suggest what should i do to acheive it.

Christian_Dahlqvist · June 20, 2022, 12:01pm

I would recommend having a look at the resources I linked to.

system · July 18, 2022, 12:01pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Benchmarking ES cluster using larger Rally dataset for multiple parallel indexing Elasticsearch rally	5	872	July 5, 2019
Scalability issue - Rally benchmark on ES 7.0.1 Elasticsearch rally	7	1191	July 2, 2019
ES Benchmark using rally to stress a 2 node setup Elasticsearch rally	6	2492	November 8, 2018
Esrally benchmarking elastic cluster running on kubernetes on AWS Elasticsearch rally	8	861	August 19, 2022
Elasticsearch index throughtput Elasticsearch	15	1585	April 17, 2019

Benchmarking es-cluster on AWS

Related topics