Huge Difference in the throughput for index

amitsa · June 28, 2022, 5:57am

I am running es-cluster on kubernetes on aws 14i_16xlarge instance.
These below pods are running on different aws instances.

track :- nyc_taxis

1 master
1 data nodes
1 esrally client

i am trying to calculate index througput only.

i see difference in result when using --include-tasks="index"

for example:-

without --include-tasks="index"

throughput for index :- 220000 docs/sec

with --include-tasks="index"

throughput for index :- 920000 docs/sec

Bradley_Deam · July 4, 2022, 1:01am

Hi @amitsa -

The --include-tasks argument means you'll only ever execute tasks of type index. Without seeing your esrally invocation or the results summary, it's difficult to reason about what might be happening.

That said, this seems a little like an XY problem. Unless your production dataset looks like that of the NYC Taxis (mappings, fields etc.), then the indexing throughput numbers are likely to be unrealistic, and in some cases completely invalid.

If you're trying to ascertain which instance types provide the best cost performance for your cluster, then it's imperative that you spend the time to model something akin to your production workload to ensure that any benchmarks are at least somewhat representative of what your cluster may need to handle once in production.

You can do this by creating your own track:

amitsa · July 5, 2022, 5:39am

Hi @Bradley_Deam

small correction in the values

without --include-tasks="index"

throughput for index :- 220000 docs/sec

with --include-tasks="index"

throughput for index :- 92000 docs/sec

I haven seen similar differences on Bare Metal platform while performing the test.
I am not looking for any particular production datasets matching to NYC_Taxis track.
I am just checking out performance on different platforms with nyc_taxis.

whole execution takes time so i was only looking into indexing performance so opted out the flag --include-tasks but i see difference in the throughput .

I don't see any error and issue in logs.

system · August 2, 2022, 5:39am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Benchmarking es-cluster on AWS Elasticsearch rally	4	623	July 18, 2022
Rally Benchmark - Which race/benchmark to use for performance testing Elasticsearch rally	2	486	December 27, 2021
Benchmarking ES cluster using larger Rally dataset for multiple parallel indexing Elasticsearch rally	5	871	July 5, 2019
Elasticsearch index throughtput Elasticsearch	15	1582	April 17, 2019
Benchmarking cluster with rally Elasticsearch rally	3	1179	August 23, 2021

Huge Difference in the throughput for index

without --include-tasks="index"

with --include-tasks="index"

without --include-tasks="index"

with --include-tasks="index"

Related topics