Rally consumes all cluster threads and crashes with small clusters

Internally, Rally uses the default Elasticsearch Python client and it uses that client to issue bulk requests. By default, Rally will issue requests as fast as it can. In your case (geonames), Rally will use a bulk size of 5000 docs/s and 8 clients. Rally cannot "know" what you want to measure and thus does not have any backoff logic like a normal client would do (and sometimes this is handy, see e.g. the blog post Why am I seeing bulk rejections in my Elasticsearch cluster?).

There is also a mode in Rally where you can define a target throughput and Rally will aim to achieve it (that depends whether Elasticsearch can achieve that throughput), see also the Rally FAQ. It is primarily meant for benchmarking operations where you're interested in a specific latency (e.g. searches) instead of batch operations (e.g. bulk indexing).

In your case you are probably interested in finding the breaking point and want to avoid bulk rejections. I suggest two things:

  • You can change the bulk size of the track to e.g. 500 documents with --track-params="bulk_size:500" (see the geonames track README). We do not expose the number of indexing clients yet as parameter although it would be possible.
  • Bulk rejections (and any other errors) get recorded by Rally and if you use a dedicated metrics store you can inspect those in more detail. However, in your case I have the impression that you want to treat a bulk rejection as a fatal error and thus you could add the parameter --on-error=abort so Rally will treat any HTTP error as fatal and abort the benchmark immediately.