Benchmarking es-cluster on AWS

Hi All,

I am running my es-cluster on kubernetes on AWS instance x2iedn(memory optimized) and i4i(Storage optimized).
I tried benchmarking the es cluster on both instance type.

I have 3 data nodes and 1 master node.
each have a heap size 31 gb.

Resources of container is requested as below.

resources:
            requests:
              cpu: 8
              memory: "64Gi"

I am using nyc_taxis track to benchmark cluster.
I see the throughput for index follows as
32x instance < 16x instance < 8x instance > 4x instance

i.e suppose

4x ---> 4000 doc/s
8x ----> 4300 docs/sec
16x ---> 3900 doc/sec
32x ---> 3800 doc/s

these are just sample not exact throughput.

I have shards: 20
replicas: 0
bulksize: 10000
client : 8
refresh_interval: -1

The throughput should increase as we go for higher capacity instance.

But i see it starts reducing from 16x instance and reduces further for 32x instance(less than 4x instance).

Kindly help me with this.

When it comes to indexing performance, the amount of memory available is less important than the performance of the storage. The instance type you are using seem better suited to high concurrent query loads as all data might be able to fit in the operating system page cache. What type of storage are you using with the instance?

You should also note that the standard tracks are not necessarily set up to load very powerful nodes by default and you may very well need to tweak settings and concurrentcy in order to make Rally generate enough load to saturate the cluster, maybe even run multiple Rally instances.

I am also curious what you are looking to get out of this exercise. Does the expected workload for the cluster at all resemble the track you are using? I always recommend creating a custom track that as realistically as possible represents the data and load you are expecting in the cluster and then use this to get an as accurate estimate as possible.

I would also recommend the following resources:

Hi @Christian_Dahlqvist

Thankyou for the response.

I am just looking to achieve the maximum indexing throughput by utilizing the available resources to the maximum.

I am just looking into any sample data which i can use to saturate my cluster to get maximum throughput.

Also my allocated resources gets used to maximum so that i can gain maximum throuhput index.

Kindly suggest what should i do to acheive it.

I would recommend having a look at the resources I linked to.