Esrally gets stuck on check-cluster-health

vtex · December 8, 2021, 9:09pm

Hi,

I'm running a local/single-node esrally. My esrally version is 2.30. When I run the below command the execution hangs while checking the status of the cluster -- i.e. in check-cluster-health.

$ esrally race --distribution-version=7.16.0 --track=nyc_taxis --challenge=append-no-conflicts

Checking the health gives a green result:
$ curl http://localhost:39200/_cat/health?v
epoch timestamp cluster status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent
1638996487 20:48:07 rally-benchmark green 1 1 2 2 0 0 0 0 - 100.0%

But, what I see in the rally.log is:
2021-12-08 20:47:15,330 -not-actor-/PID:92289 Elasticsearch WARNING GET http://127.0.0.1:39200/_cluster/he alth/nyc_taxis?wait_for_status=green&wait_for_no_relocating_shards=false [status:408 request:30.003s]

Other similar questions couldn't fix the issue for me.

Thanks for the help.

json · December 8, 2021, 10:38pm

Hi Alpha,

Would you mind running your curl check using the same endpoint as Rally? I.e.,

curl 'http://127.0.0.1:39200/_cluster/health/nyc_taxis?wait_for_status=green&wait_for_no_relocating_shards=false'

Also, make sure there are no other process leftovers from other esrally executions.

Can you share more of rally.log, preferably the complete file and any customized configurations?

GET /_cluster/health/<index> can behave this way for non-existing indices or if the index will never be green. _cat/health returns green because it represents the status of all known indices in the cluster.

Some possibilities:

The nyc_taxis index exists but is yellow. This will happen if the number of index replicas has been configured to be >0 for a single node cluster.
nyc_taxis does not exist and is, therefore red.

See also Rally race gets stuck on check-cluster-health.

vtex · December 9, 2021, 12:18am

$ curl -X GET http://127.0.0.1:39200/_cluster/health/nyc_taxis
{"cluster_name":"rally-benchmark","status":"red","timed_out":true,"number_of_nodes":1,"number_of_data_nodes":1,"active_primary_shards":0,"active_shards":0,"relocating_shards":0,"initializing_shards":0,"unassigned_shards":0,"delayed_unassigned_shards":0,"number_of_pending_tasks":0,"number_of_in_flight_fetch":0,"task_max_waiting_in_queue_millis":0,"active_shards_percent_as_number":100.0}

The first time I run esrally, there is no issue. But when I try to re-run again (after a clean completion of the first run), the issue comes. Currently, I'm using the default configuration and not changing anything. That is, to repeat the problem, what I have to do is run it twice.

$ esrally race --distribution-version=7.16.0 --track=nyc_taxis --challenge=append-no-conflicts
... completes successfully.
$ esrally race --distribution-version=7.16.0 --track=nyc_taxis --challenge=append-no-conflicts
... waits forever for check-cluster-health

Quentin_Pradet · December 14, 2021, 6:32am

Since running the whole nyc_taxis challenge is slow on my laptop, I added --test-mode to your commands, and I fail to reproduce the issue. Do you also have the issue when you add --test-mode?

Also, what do you call "the first run"? What do you need to reset to get into "first run" conditions?

Is there a reason why you could not share your Rally configuration and logs, as requested by Jason?

system · January 11, 2022, 6:33am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Rally race gets stuck on check-cluster-health Elasticsearch rally	5	1487	April 27, 2021
Esrally will hang if I use elasticsearch 8.0.0-alpha1 Elasticsearch rally	12	568	December 7, 2021
Benchmarking Remote Cluster Stalls in Preparing Elasticsearch rally	7	1393	November 18, 2019
Rally stalls when I try to benchmark Elasticsearch cluster Elasticsearch docker , rally	4	660	September 7, 2020
Esrally stalls when running in benchmark-only mode Elasticsearch rally	10	2306	February 24, 2017

Esrally gets stuck on check-cluster-health

Related topics