Rally race gets stuck on check-cluster-health

When I use the track from repository, it works without any problem.
Now, I created the track data from existing cluster. and hanging at the stage of 'check-cluster-health'.

I am using esrally version 2.0.4
elasticsearch version : 6.8.1

> 
> esrally race --pipeline=benchmark-only --track-path=~/tracks/testtrack/ --target-hosts=127.0.0.1:9200  --kill-running-processes --test-mode
> 
>     ____        ____
>    / __ \____ _/ / /_  __
>   / /_/ / __ `/ / / / / /
>  / _, _/ /_/ / / / /_/ /
> /_/ |_|\__,_/_/_/\__, /
>                 /____/
> 
> [INFO] Racing on track [testtrack] and car ['external'] with version [6.8.1].
> 
> [WARNING] indexing_total_time is 59 ms indicating that the cluster is not in a defined clean state. Recorded index time metrics may be misleading.
> Running delete-index                                                           [100% done]
> Running create-index                                                           [100% done]
> Running cluster-health                                                         [  0% done]
> 
> 
> Here is the rally log. 
> 2021-03-30 15:28:43,930 ActorAddr-(T|:46065)/PID:8145 esrally.client INFO Creating ES client connected to [{'host': '127.0.0.1', 'port': 9200}] with options [{'timeout': 60, 'max_connections': 1}]
> 2021-03-30 15:28:43,934 ActorAddr-(T|:44123)/PID:8146 esrally.actor INFO Worker[1] reached join point at index [6].
> 2021-03-30 15:28:43,936 ActorAddr-(T|:41685)/PID:8147 esrally.actor INFO Worker[2] is continuing its work at task index [4] on [104.595866], that is in [-1 day, 23:59:59.985102].
> 2021-03-30 15:28:43,930 ActorAddr-(T|:46065)/PID:8145 esrally.client INFO SSL support: off
> 2021-03-30 15:28:43,936 ActorAddr-(T|:41685)/PID:8147 esrally.actor INFO Worker[2] reached join point at index [6].
> 2021-03-30 15:28:43,930 ActorAddr-(T|:46065)/PID:8145 esrally.client INFO HTTP basic authentication: off
> 2021-03-30 15:28:43,930 ActorAddr-(T|:46065)/PID:8145 esrally.client INFO HTTP compression: off
> 2021-03-30 15:28:43,927 ActorAddr-(T|:34987)/PID:8139 esrally.driver.driver INFO Scheduling next task for worker id [1] at their timestamp [104.596997] (master timestamp [104.601953])
> 2021-03-30 15:28:43,931 ActorAddr-(T|:46065)/PID:8145 esrally.driver.driver INFO Task assertions enabled: False
> 2021-03-30 15:28:43,931 ActorAddr-(T|:46065)/PID:8145 esrally.driver.driver INFO Choosing [unthrottled] for [cluster-health].
> 2021-03-30 15:28:43,931 ActorAddr-(T|:46065)/PID:8145 esrally.driver.driver INFO Creating iteration-count based schedule with [None] distribution for [cluster-health] with [0] warmup iterations and [1] iterations.
> 2021-03-30 15:28:43,931 ActorAddr-(T|:46065)/PID:8145 esrally.driver.driver INFO iteration-count-based schedule will determine when the schedule for [cluster-health] terminates.
> 2021-03-30 15:28:43,927 ActorAddr-(T|:34987)/PID:8139 esrally.driver.driver INFO Scheduling next task for worker id [2] at their timestamp [104.595866] (master timestamp [104.601953])
> 2021-03-30 15:28:43,928 ActorAddr-(T|:34987)/PID:8139 esrally.driver.driver INFO Scheduling next task for worker id [3] at their timestamp [104.597844] (master timestamp [104.601953])
> 2021-03-30 15:28:43,932 ActorAddr-(T|:34987)/PID:8139 esrally.driver.driver INFO [1/4] workers reached join point [3/4].
> 2021-03-30 15:28:43,940 ActorAddr-(T|:34987)/PID:8139 esrally.driver.driver INFO [2/4] workers reached join point [3/4].
> 2021-03-30 15:28:43,940 ActorAddr-(T|:34987)/PID:8139 esrally.driver.driver INFO [3/4] workers reached join point [3/4].
> 2021-03-30 15:29:13,947 -not-actor-/PID:8145 elasticsearch WARNING GET http://127.0.0.1:9200/_cluster/health/testtrack-v4?wait_for_status=green&wait_for_no_relocating_shards=true [status:408 request:30.015s]

The cluster health is currently yellow but looks like it has to be green to complete the test? I am getting the same warning and even I killed the process, it keep generating the same warning.

Is there any work-around this?
Thanks in advance.

Hello,

Thank you for your interest in rally! Depending on what you have in your test track, and if you modelled it based on the existing standard tracks (eg. geonames) there is a way to specify what status to wait for: --track-params="cluster_health:'yellow'".

However, please note that running performance tests on a non-green cluster may produce non-stable/non-reproducible results. Also, --test-mode is great to use when debugging a track, but it should not be used for the actual performance benchmarks.

Thanks,
Evgenia

That is the trick which pass to the next task. Thanks for the quick response.

Noted for --test-mode. I thought when I added --test-mode, it will abort the hanging state. but it was same. yes of course I will not add that for the real benchmark testing.

1 Like

yes when I use the track=geonames, race completes without any issues.