Erally stuck while running against already existing cluster

juriskr · November 5, 2020, 1:43pm

I'm trying to start erally using docker based installation and the following command:
docker run elastic/rally --track=nyc_taxis --test-mode --pipeline=benchmark-only --target-hosts=, got the following output and then erally stuck with the following messages:

    ____        ____
   / __ \____ _/ / /_  __
  / /_/ / __ `/ / / / / /
 / _, _/ /_/ / / / /_/ /
/_/ |_|\__,_/_/_/\__, /
                /____/

[INFO] Downloading data for track nyc_taxis (30.6 kB total size)                  [100.0%]
[INFO] Decompressing track data from [/rally/.rally/benchmarks/data/nyc_taxis/documents-1k.json.bz2] to [/rally/.rally/benchmarks/data/nyc_taxis/documents-1k.json] ... [OK]
[INFO] Preparing file offset table for [/rally/.rally/benchmarks/data/nyc_taxis/documents-1k.json] ... [WARNING] merges_total_time is 11448170359 ms indicating that the cluster is not in a defined clean state. Recorded index time metrics may be misleading.
[WARNING] merges_total_throttled_time is 9180486646 ms indicating that the cluster is not in a defined clean state. Recorded index time metrics may be misleading.
[WARNING] indexing_total_time is 2726607864 ms indicating that the cluster is not in a defined clean state. Recorded index time metrics may be misleading.
[WARNING] refresh_total_time is 325159125 ms indicating that the cluster is not in a defined clean state. Recorded index time metrics may be misleading.
[WARNING] flush_total_time is 31291493 ms indicating that the cluster is not in a defined clean state. Recorded index time metrics may be misleading.
[OK]
Running delete-index                                                           [100% done]
Running create-index                                                           [100% done]
Running check-cluster-health                                                   [100% done]
Running index                                                                  [100% done]
Running refresh-after-index                                                    [100% done]
Running force-merge                                                            [100% done]
Running refresh-after-force-merge                                              [100% done]
Running wait-until-merges-finish                                               [  0% done]

Can anybody please help me to understand what is wrong ?

Thanks in advance.

baz · November 5, 2020, 3:51pm

Rally is running a force merge operation, and waiting for it to finish. Force merges take all of your segments in your index and shrink them down to one big segment. This can take a long time if your cluster is small, or if your track is quite large. As you can see from the io chart in our nightly benchmark for nyc_taxis, the index is 250ish GB, which will definitely take some time.

If you think its taking way too long, it might be time to look at your existing Elasticsearch cluster, to see what is going on there. The Rally codebase shows what we execute to see if the Elasticsearch cluster is still waiting on the force merge to complete. You should also look in your logs to see if there are any errors that might have caused the force merge to fail but have tasks stay around.

juriskr · November 5, 2020, 8:56pm

Strange, becasue we have forcemerge tasks executed by curator, but they definitely not running while I was trying to benchmark with rally.
I've checked this using Tasks API:

curl -s 192.168.56.94:9200/_tasks?pretty | grep -i forcemerge

and get nothing.
Cluster is green, nyc_taxis gets created:

curl -s 192.168.56.94:9200/_cat/indices  | grep nyc
green open nyc_taxis             gt9K36KFQEaepRtx1MFZyA  1 0       1000 0 182.6kb 182.6kb

Don't see any issues on cluster masters logs as well as on the node, where index have been created except for this message about index creation:

[2020-11-05T20:43:13,220][INFO ][o.e.c.m.MetaDataCreateIndexService] [rix3-elkm1-dr] [nyc_taxis] creating index, cause [api], templates [], shards [1]/[0], mappings [type]
[2020-11-05T20:43:14,728][INFO ][o.e.c.r.a.AllocationService] [rix3-elkm1-dr] Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[nyc_taxis][0]] ...]).

By the way don't know if it have any impact, but I run test using docker images of erally software and using coordinator node to access cluster.

juriskr · November 5, 2020, 9:08pm

By the way I was able to start esrally when I've used non-docker installation - simple pip.

baz · November 5, 2020, 9:20pm

ok, if you do not have any force merge tasks, its probably worth looking what the rally logs are saying about the cluster during this time period. If you can do a clean run of rally and attach the logs, we can inspect them and see what might be the issue.

Also, what version of rally are you running?

juriskr · November 6, 2020, 12:25pm

The one I had problem with was a docker image with latest tag:

$ esrally

    ____        ____
   / __ \____ _/ / /_  __
  / /_/ / __ `/ / / / / /
 / _, _/ /_/ / / / /_/ /
/_/ |_|\__,_/_/_/\__, /
                /____/

[ERROR] Cannot race. Only the [benchmark-only] pipeline is supported by the Rally Docker image.
Add --pipeline=benchmark-only in your Rally arguments and try again.
For more details read the docs for the benchmark-only pipeline in https://esrally.readthedocs.io/en/2.0.2/pipelines.html#benchmark-only


Getting further help:
*********************
* Check the log files in /rally/.rally/logs for errors.
* Read the documentation at https://esrally.readthedocs.io/en/2.0.2/.
* Ask a question on the forum at https://discuss.elastic.co/tags/c/elastic-stack/elasticsearch/rally.
* Raise an issue at https://github.com/elastic/rally/issues and include the log files in /rally/.rally/logs.

-------------------------------
[INFO] FAILURE (took 4 seconds)
-------------------------------
$ esrally --version
esrally 2.0.2
$

Also I don't have the same issue with the esrally installed using pip. Will try to run esrally using dockre a bit later and be back with logs.

system · December 4, 2020, 12:25pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Esrally failed to show result (Cannot race. Load generator [14] has exited prematurely.) Elasticsearch rally	3	979	December 20, 2019
Rally stalls when I try to benchmark Elasticsearch cluster Elasticsearch docker , rally	4	694	September 7, 2020
Cluster is not in a defined clean state while using the Elasticsearch setup by Rally Elasticsearch rally	5	827	July 13, 2022
Elasticsearch REST API layer is not available. Forcefully terminated cluster Elasticsearch rally	17	2127	January 8, 2020
Could not clone from 'https://github.com/elastic/rally-tracks' Elasticsearch rally	14	4033	July 5, 2017

Erally stuck while running against already existing cluster

Related topics