[WARNING] No throughput metrics available for [index-append]. Likely cause: The benchmark ended already during warmup

amitsa · June 11, 2022, 10:52am

cluster is running on kubernetes

NUMBER_OF_SHARDS=${NUMBER_OF_SHARDS:-10}
NUMBER_OF_REPLICAS=${NUMBER_OF_REPLICAS:-0}
INGEST_PERCENTAGE=${INGEST_PERCENTAGE:-100}
BULK_SIZE=${BULK_SIZE:-5000}
BULK_INDEXING_CLIENTS=${BULK_INDEXING_CLIENTS:-8}
REFRESH_INTERVAL=${REFRESH_INTERVAL:--1}

Heap size is 31 gb
tracks: geopoint
1 data node : 31gb heap, 16 vcpu
1 master node : 31gb heap, 16vcpu

index-append error rate is 0.00%
Kindly help, why am i getting this error

I don't see any error in logs as well

    ____        ____
3:    / __ \____ _/ / /_  __
3:   / /_/ / __ `/ / / / / /
3:  / _, _/ /_/ / / / /_/ /
3: /_/ |_|\__,_/_/_/\__, /
3:                 /____/
3:
3: [INFO] Decompressing track data from [/rally/.rally/benchmarks/data/geopoint/documents.json.bz2] to [/rally/.rally/benchmarks/data/geopoint/documents.json] (resulting size: [2.28] GB) ... [OK]
3: [INFO] Preparing file offset table for [/rally/.rally/benchmarks/data/geopoint/documents.json] ... [OK]
3: Running delete-index                                                           [100% done]
Running create-index                                                           [100% done]
Running check-cluster-health                                                   [100% done]
Running index-append                                                           [100% done]
Running refresh-after-index                                                    [100% done]
Running force-merge                                                            [100% done]
Running refresh-after-force-merge                                              [100% done]
Running wait-until-merges-finish                                               [100% done]
Running polygon                                                                [100% done]
Running bbox                                                                   [100% done]
Running distance                                                               [100% done]
Running distanceRange                                                          [100% done][INFO] Racing on track [geopoint], challenge [append-no-conflicts] and car ['external'] with version [8.2.0].
3:
3:
3: ------------------------------------------------------
3:     _______             __   _____
3:    / ____(_)___  ____ _/ /  / ___/_________  ________
3:   / /_  / / __ \/ __ `/ /   \__ \/ ___/ __ \/ ___/ _ \
3:  / __/ / / / / / /_/ / /   ___/ / /__/ /_/ / /  /  __/
3: /_/   /_/_/ /_/\__,_/_/   /____/\___/\____/_/   \___/
3: ------------------------------------------------------
3:
3: Metric,Task,Value,Unit
3: Cumulative indexing time of primary shards,,12.6378,min
3: Min cumulative indexing time across primary shards,,2.44865,min
3: Median cumulative indexing time across primary shards,,2.47545,min
3: Max cumulative indexing time across primary shards,,2.7270833333333333,min
3: Cumulative indexing throttle time of primary shards,,0,min
3: Min cumulative indexing throttle time across primary shards,,0,min
3: Median cumulative indexing throttle time across primary shards,,0,min
3: Max cumulative indexing throttle time across primary shards,,0,min
3: Cumulative merge time of primary shards,,0.005716666666666667,min
3: Cumulative merge count of primary shards,,5,
3: Min cumulative merge time across primary shards,,0.0004333333333333333,min
3: Median cumulative merge time across primary shards,,0.00065,min
3: Max cumulative merge time across primary shards,,0.0027166666666666667,min
3: Cumulative merge throttle time of primary shards,,0,min
3: Min cumulative merge throttle time across primary shards,,0,min
3: Median cumulative merge throttle time across primary shards,,0,min
3: Max cumulative merge throttle time across primary shards,,0,min
3: Cumulative refresh time of primary shards,,0.4688333333333333,min
3: Cumulative refresh count of primary shards,,40,
3: Min cumulative refresh time across primary shards,,0.08556666666666667,min
3: Median cumulative refresh time across primary shards,,0.09646666666666667,min
3: Max cumulative refresh time across primary shards,,0.0972,min
3: Cumulative flush time of primary shards,,2.03135,min
3: Cumulative flush count of primary shards,,15,
3: Min cumulative flush time across primary shards,,0.39188333333333336,min
3: Median cumulative flush time across primary shards,,0.4111333333333333,min
3: Max cumulative flush time across primary shards,,0.42086666666666667,min
3: Total Young Gen GC time,,1.502,s
3: Total Young Gen GC count,,47,
3: Total Old Gen GC time,,0,s
3: Total Old Gen GC count,,0,
3: Store size,,3.163294860161841,GB
3: Translog size,,2.561137080192566e-07,GB
3: Heap used for segments,,0,MB
3: Heap used for doc values,,0,MB
3: Heap used for terms,,0,MB
3: Heap used for norms,,0,MB
3: Heap used for points,,0,MB
3: Heap used for stored fields,,0,MB
3: Segment count,,97,
3: error rate,index-append,0.00,%
3: Min Throughput,polygon,2.00,ops/s
3: Mean Throughput,polygon,2.00,ops/s
3: Median Throughput,polygon,2.00,ops/s
3: Max Throughput,polygon,2.01,ops/s
3: 50th percentile latency,polygon,34.06202851328999,ms
3: 90th percentile latency,polygon,35.03835282754153,ms
3: 99th percentile latency,polygon,36.454293074784836,ms
3: 100th percentile latency,polygon,43.34578406997025,ms
3: 50th percentile service time,polygon,32.78130997205153,ms
3: 90th percentile service time,polygon,33.77172515029088,ms
3: 99th percentile service time,polygon,34.83832438359972,ms
3: 100th percentile service time,polygon,41.91419004928321,ms
3: error rate,polygon,0.00,%
3: Min Throughput,bbox,2.00,ops/s
3: Mean Throughput,bbox,2.01,ops/s
3: Median Throughput,bbox,2.01,ops/s
3: Max Throughput,bbox,2.01,ops/s
3: 50th percentile latency,bbox,42.071087984368205,ms
3: 90th percentile latency,bbox,44.00668186135591,ms
3: 99th percentile latency,bbox,54.396707259584225,ms
3: 100th percentile latency,bbox,55.51117891445756,ms
3: 50th percentile service time,bbox,40.859389933757484,ms
3: 90th percentile service time,bbox,42.89666726253927,ms
3: 99th percentile service time,bbox,53.70071197277866,ms
3: 100th percentile service time,bbox,53.93424991052598,ms
3: error rate,bbox,0.00,%
3: Min Throughput,distance,5.01,ops/s
3: Mean Throughput,distance,5.01,ops/s
3: Median Throughput,distance,5.01,ops/s
3: Max Throughput,distance,5.01,ops/s
3: 50th percentile latency,distance,11.564237996935844,ms
3: 90th percentile latency,distance,12.179009604733437,ms
3: 99th percentile latency,distance,14.521090824855493,ms
3: 100th percentile latency,distance,14.75015701726079,ms
3: 50th percentile service time,distance,10.548858088441193,ms
3: 90th percentile service time,distance,11.092877946794033,ms
3: 99th percentile service time,distance,13.455560109578073,ms
3: 100th percentile service time,distance,13.488444034010172,ms
3: error rate,distance,0.00,%
3: Min Throughput,distanceRange,0.50,ops/s
3: Mean Throughput,distanceRange,0.50,ops/s
3: Median Throughput,distanceRange,0.50,ops/s
3: Max Throughput,distanceRange,0.50,ops/s
3: 50th percentile latency,distanceRange,1219.5492314640433,ms
3: 90th percentile latency,distanceRange,1234.1301434440538,ms
3: 99th percentile latency,distanceRange,1242.6772359397728,ms
3: 100th percentile latency,distanceRange,1252.7494929963723,ms
3: 50th percentile service time,distanceRange,1217.9977719206363,ms
3: 90th percentile service time,distanceRange,1232.5072522158735,ms
3: 99th percentile service time,distanceRange,1241.182112590177,ms
3: 100th percentile service time,distanceRange,1250.9003090672195,ms
3: error rate,distanceRange,0.00,%
3:
3: [WARNING] No throughput metrics available for [index-append]. Likely cause: The benchmark ended already during warmup.
3:
3: [INFO] Race id is [7511a44f-8e47-4070-ae1d-41546331e905]
3:
3: ----------------------------------
3: [INFO] SUCCESS (took 1133 seconds)
3: ----------------------------------

amitsa · June 13, 2022, 10:41am

Any reply on this ....its been two days i am not able to figure it out. whats happening?

Hi @json Jason Bryan can you help me with this issue? I have been stuck for 3 days

Quentin_Pradet · June 13, 2022, 12:11pm

Hello, this is surprising. Have you tried reproducing this outside of your Kubernetes environment? What is the Rally invocation? Can you please share the logs?

amitsa · June 13, 2022, 1:17pm

Hi @Quentin_Pradet,

I haven't tried it outside kubernetes environment. I have elastic cluster setup on kubernetes.
Invocation command is as below

ELASTIC_EP=https://es-master:9200
CLIENT_OPTIONS="basic_auth_user:rally,basic_auth_password:changeme,timeout:120,use_ssl:true,verify_certs:false,ca_certs:/rally/cacert.pem"
echo "${ES_RALLY_RACE_params_json}  ${ES_RALLY_RACE} ${ELASTIC_EP} ${ES_TESTMODE} ${CLIENT_OPTIONS}"

echo "${ES_NO_OF_SHARDS} ${ES_NO_OF_REPLICAS} ${ES_INGEST_PERCENTAGE} ${ES_BULKSIZE} ${ES_BULK_INDEXING_CLIENT} ${ES_REFRESH_INTERVAL}"

esrally race --offline --track-params='{"number_of_shards":'${ES_NO_OF_SHARDS}',"number_of_replicas":'${ES_NO_OF_REPLICAS}',"ingest_percentage":'${ES_INGEST_PERCENTAGE}',"bulk_size":'${ES_BULKSIZE}',"bulk_indexing_clients":'${ES_BULK_INDEXING_CLIENT}',"index_settings": { "index.refresh_interval":'${ES_REFRESH_INTERVAL}' }}' --track-path=/rally/.rally/benchmarks/tracks/default/${ES_RALLY_RACE} --pipeline=benchmark-only --target-hosts=${ELASTIC_EP} ${ES_TESTMODE} --client-options ${CLIENT_OPTIONS} --report-format=csv

Log is bigger in size. How can i share the logs. let me know

with geopoint i am getting this error with default param and with nyc_taxis i am getting this warning with 16, 32,64,128 clinets with bulksize 10000.

amitsa · June 15, 2022, 9:06am

Any update on the above issue. I am not able to upload the logs on Elastic Upload Service : Login as i am not able to login with the email i am using. Can i share it some where else.?

Quentin_Pradet · June 15, 2022, 9:39am

Sorry the correct link is https://upload.elastic.co/u/83648982-d036-4939-ac9c-8aa4a24282dc.

Anyway, we discussed this with @dliappis and it turns out the warning is probably accurate:

[WARNING] No throughput metrics available for [index-append]. Likely cause: The benchmark ended already during warmup.

geopoint is a small dataset (~2.3GiB uncompressed) so in most cases with a fast system it can index faster than 120s which is the warm up time: https://github.com/elastic/rally-tracks/blob/688d04ba3f1e3748307ab59fcc1586951ec290f4/geopoint/challenges/default.json#L29
nyc_taxis is larger at ~74.3GiB but since you are getting the error only when using >=16 clients with a large bulk size of 10000, this is likely the same issue, you finished in less than 240s: https://github.com/elastic/rally-tracks/blob/688d04ba3f1e3748307ab59fcc1586951ec290f4/nyc_taxis/challenges/default.json#L33

Can you try reducing the warmup in those tracks and see if you get results?

amitsa · June 15, 2022, 10:56am

Hi @Quentin_Pradet

Thankyou for your quick response.
I had the same doubt as you mentioned. So reducing the warmup in those tracks would impact the results.

I will test it and share the details.

dliappis · June 15, 2022, 11:24am

Additionally you should consider whether these standard workloads are suitable for what you want to benchmark. Are they representative of the way your organization uses the Elastic stack? One indication is exactly what you just saw i.e. that the warmup time is more than the actual time taken to index, therefore the datasets might be too small, at least for indexing; you'd need to consider the size of your cluster in relation to the size of the workload.

I strongly recommend watching this talk and work on creating a dataset that is representative to your own use case.

amitsa · June 16, 2022, 9:56am

Hi @dliappis ,

Thankyou for the response.

as you mentioned above indication about warmup time is more then being data indexed time is true.

I am looking to see the performance of server like cpu utilization, memory usage , disk usage on different platforms while benchmarking Elasticsearch.

I am trying to figure out the saturation point for cpu usage , memory usage and disk usage.

Kindly suggest which data track will be good enough to do so.

dliappis · June 16, 2022, 11:11am

If I understood correctly, what you are trying to do i.e. explore the saturation point of hardware resources doesn't seem like a good fit for a macrobenchmarking tool like Rally, but rather a hardware benchmarking suite like the fio benchmarming suite for the I/O side etc.

If you are trying to understand whether server X is better than server Y for the kind of workload that Elasticsearch is serving in your organization, then, as I mentioned earlier, you will need to create your own Rally track that models your use case. Then by running your track and analyzing the resource usage on your servers you can explore whether you can achieve better metrics (e.g. median indexing throughput or lower latency for your queries) and/or satisfy your SLOs as well as understand the bottleneck of your benchmark.

system · July 14, 2022, 11:11am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Premature end of Benchmark run Elasticsearch rally	10	2423	November 14, 2017
New report error:No throughput metrics available for [bulk]. Likely cause: The benchmark ended already during warmup Elasticsearch rally	4	655	December 9, 2019
No throughput metrics available for [bulk]. Likely cause: The benchmark ended already during warmup Elasticsearch rally	10	1922	October 4, 2018
No Throughput result in the summary report Elasticsearch rally	9	1961	March 31, 2017
Rally op_metrics throughput is null Elasticsearch rally	1	210	September 22, 2023

[WARNING] No throughput metrics available for [index-append]. Likely cause: The benchmark ended already during warmup

Related topics