Rally race gets stuck on check-cluster-health

Ellie_Bird · March 30, 2021, 3:41pm

When I use the track from repository, it works without any problem.
Now, I created the track data from existing cluster. and hanging at the stage of 'check-cluster-health'.

I am using esrally version 2.0.4
elasticsearch version : 6.8.1

> 
> esrally race --pipeline=benchmark-only --track-path=~/tracks/testtrack/ --target-hosts=127.0.0.1:9200  --kill-running-processes --test-mode
> 
>     ____        ____
>    / __ \____ _/ / /_  __
>   / /_/ / __ `/ / / / / /
>  / _, _/ /_/ / / / /_/ /
> /_/ |_|\__,_/_/_/\__, /
>                 /____/
> 
> [INFO] Racing on track [testtrack] and car ['external'] with version [6.8.1].
> 
> [WARNING] indexing_total_time is 59 ms indicating that the cluster is not in a defined clean state. Recorded index time metrics may be misleading.
> Running delete-index                                                           [100% done]
> Running create-index                                                           [100% done]
> Running cluster-health                                                         [  0% done]
> 
> 
> Here is the rally log. 
> 2021-03-30 15:28:43,930 ActorAddr-(T|:46065)/PID:8145 esrally.client INFO Creating ES client connected to [{'host': '127.0.0.1', 'port': 9200}] with options [{'timeout': 60, 'max_connections': 1}]
> 2021-03-30 15:28:43,934 ActorAddr-(T|:44123)/PID:8146 esrally.actor INFO Worker[1] reached join point at index [6].
> 2021-03-30 15:28:43,936 ActorAddr-(T|:41685)/PID:8147 esrally.actor INFO Worker[2] is continuing its work at task index [4] on [104.595866], that is in [-1 day, 23:59:59.985102].
> 2021-03-30 15:28:43,930 ActorAddr-(T|:46065)/PID:8145 esrally.client INFO SSL support: off
> 2021-03-30 15:28:43,936 ActorAddr-(T|:41685)/PID:8147 esrally.actor INFO Worker[2] reached join point at index [6].
> 2021-03-30 15:28:43,930 ActorAddr-(T|:46065)/PID:8145 esrally.client INFO HTTP basic authentication: off
> 2021-03-30 15:28:43,930 ActorAddr-(T|:46065)/PID:8145 esrally.client INFO HTTP compression: off
> 2021-03-30 15:28:43,927 ActorAddr-(T|:34987)/PID:8139 esrally.driver.driver INFO Scheduling next task for worker id [1] at their timestamp [104.596997] (master timestamp [104.601953])
> 2021-03-30 15:28:43,931 ActorAddr-(T|:46065)/PID:8145 esrally.driver.driver INFO Task assertions enabled: False
> 2021-03-30 15:28:43,931 ActorAddr-(T|:46065)/PID:8145 esrally.driver.driver INFO Choosing [unthrottled] for [cluster-health].
> 2021-03-30 15:28:43,931 ActorAddr-(T|:46065)/PID:8145 esrally.driver.driver INFO Creating iteration-count based schedule with [None] distribution for [cluster-health] with [0] warmup iterations and [1] iterations.
> 2021-03-30 15:28:43,931 ActorAddr-(T|:46065)/PID:8145 esrally.driver.driver INFO iteration-count-based schedule will determine when the schedule for [cluster-health] terminates.
> 2021-03-30 15:28:43,927 ActorAddr-(T|:34987)/PID:8139 esrally.driver.driver INFO Scheduling next task for worker id [2] at their timestamp [104.595866] (master timestamp [104.601953])
> 2021-03-30 15:28:43,928 ActorAddr-(T|:34987)/PID:8139 esrally.driver.driver INFO Scheduling next task for worker id [3] at their timestamp [104.597844] (master timestamp [104.601953])
> 2021-03-30 15:28:43,932 ActorAddr-(T|:34987)/PID:8139 esrally.driver.driver INFO [1/4] workers reached join point [3/4].
> 2021-03-30 15:28:43,940 ActorAddr-(T|:34987)/PID:8139 esrally.driver.driver INFO [2/4] workers reached join point [3/4].
> 2021-03-30 15:28:43,940 ActorAddr-(T|:34987)/PID:8139 esrally.driver.driver INFO [3/4] workers reached join point [3/4].
> 2021-03-30 15:29:13,947 -not-actor-/PID:8145 elasticsearch WARNING GET http://127.0.0.1:9200/_cluster/health/testtrack-v4?wait_for_status=green&wait_for_no_relocating_shards=true [status:408 request:30.015s]

The cluster health is currently yellow but looks like it has to be green to complete the test? I am getting the same warning and even I killed the process, it keep generating the same warning.

Is there any work-around this?
Thanks in advance.

Evgenia_Badiyanova · March 30, 2021, 4:13pm

Hello,

Thank you for your interest in rally! Depending on what you have in your test track, and if you modelled it based on the existing standard tracks (eg. geonames) there is a way to specify what status to wait for: --track-params="cluster_health:'yellow'".

However, please note that running performance tests on a non-green cluster may produce non-stable/non-reproducible results. Also, --test-mode is great to use when debugging a track, but it should not be used for the actual performance benchmarks.

Thanks,
Evgenia

Ellie_Bird · March 30, 2021, 4:22pm

That is the trick which pass to the next task. Thanks for the quick response.

Ellie_Bird · March 30, 2021, 4:23pm

Noted for --test-mode. I thought when I added --test-mode, it will abort the hanging state. but it was same. yes of course I will not add that for the real benchmark testing.

Ellie_Bird · March 30, 2021, 4:24pm

yes when I use the track=geonames, race completes without any issues.

system · April 27, 2021, 4:25pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Esrally gets stuck on check-cluster-health Elasticsearch rally	4	1232	January 11, 2022
Cluster is not in a defined clean state while using the Elasticsearch setup by Rally Elasticsearch rally	5	755	July 13, 2022
Cannot race, worker has exited prematurely Elasticsearch rally	6	129	July 1, 2024
Benchmarking Remote Cluster Stalls in Preparing Elasticsearch rally	7	1393	November 18, 2019
ESRally issue when running EventData track Elasticsearch rally	8	1102	April 27, 2022

Rally race gets stuck on check-cluster-health

Related topics