Esrally gets stuck on check-cluster-health

Hi,

I'm running a local/single-node esrally. My esrally version is 2.30. When I run the below command the execution hangs while checking the status of the cluster -- i.e. in check-cluster-health.

$ esrally race --distribution-version=7.16.0 --track=nyc_taxis --challenge=append-no-conflicts

Checking the health gives a green result:
$ curl http://localhost:39200/_cat/health?v
epoch timestamp cluster status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent
1638996487 20:48:07 rally-benchmark green 1 1 2 2 0 0 0 0 - 100.0%

But, what I see in the rally.log is:
2021-12-08 20:47:15,330 -not-actor-/PID:92289 Elasticsearch WARNING GET http://127.0.0.1:39200/_cluster/he alth/nyc_taxis?wait_for_status=green&wait_for_no_relocating_shards=false [status:408 request:30.003s]

Other similar questions couldn't fix the issue for me.

Thanks for the help.

Hi Alpha,

Would you mind running your curl check using the same endpoint as Rally? I.e.,

curl 'http://127.0.0.1:39200/_cluster/health/nyc_taxis?wait_for_status=green&wait_for_no_relocating_shards=false'

Also, make sure there are no other process leftovers from other esrally executions.

Can you share more of rally.log, preferably the complete file and any customized configurations?

GET /_cluster/health/<index> can behave this way for non-existing indices or if the index will never be green. _cat/health returns green because it represents the status of all known indices in the cluster.

Some possibilities:

  • The nyc_taxis index exists but is yellow. This will happen if the number of index replicas has been configured to be >0 for a single node cluster.
  • nyc_taxis does not exist and is, therefore red.

See also Rally race gets stuck on check-cluster-health.

$ curl -X GET http://127.0.0.1:39200/_cluster/health/nyc_taxis
{"cluster_name":"rally-benchmark","status":"red","timed_out":true,"number_of_nodes":1,"number_of_data_nodes":1,"active_primary_shards":0,"active_shards":0,"relocating_shards":0,"initializing_shards":0,"unassigned_shards":0,"delayed_unassigned_shards":0,"number_of_pending_tasks":0,"number_of_in_flight_fetch":0,"task_max_waiting_in_queue_millis":0,"active_shards_percent_as_number":100.0}

The first time I run esrally, there is no issue. But when I try to re-run again (after a clean completion of the first run), the issue comes. Currently, I'm using the default configuration and not changing anything. That is, to repeat the problem, what I have to do is run it twice.

$ esrally race --distribution-version=7.16.0 --track=nyc_taxis --challenge=append-no-conflicts
... completes successfully.
$ esrally race --distribution-version=7.16.0 --track=nyc_taxis --challenge=append-no-conflicts
... waits forever for check-cluster-health

Since running the whole nyc_taxis challenge is slow on my laptop, I added --test-mode to your commands, and I fail to reproduce the issue. Do you also have the issue when you add --test-mode?

Also, what do you call "the first run"? What do you need to reset to get into "first run" conditions?

Is there a reason why you could not share your Rally configuration and logs, as requested by Jason?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.