Esrally to test resiliency of the server

I like to test the resiliency of the elasticsearch cluster when there is a server failure (1 Elasticsearch Node is down).

I am following the below process

  • esrally rally as loadgenerator and to deploy a 4 node cluster.
    * "number_of_replicas=1":
  • Using event data track with challenge "--challenge=index-logs-fixed-daily-volume "
  • Shutdown one of the elasticsearch node I get "asyncio.exceptions.CancelledError" error and challenge does not complete.

Since there are 2 replicas my expectation was elasticsearch workload will continue with warning but that is not happening.

Any pointer on how to achieve this test with rally ?

Hi,

I fear Rally is maybe not the right tool for this job as its main purpose is benchmarking and one of the core assumptions is that the system is in a steady state so we can perform reproducible measurements.

What you're after seems more like a QA test ensuring stability of the cluster after it has lost a node. One of the features of the Python client is to use sniffing to learn about the current cluster topology so it will know about disappearing or new cluster nodes. As a corollary of requiring a system in steady state Rally does not enable cluster sniffing. We also do not retry failed requests but only record that a failure has happened as retries will skew measurement results as well.

It would probably make sense that you instead write a small test harness e.g. using the Elasticsearch Python client and enable sniffing and maybe also implement retry handing in case of failed requests. You could still leverage the data sets that we offer with Rally though in order to generate some load. At the end of your test run you'd likely also want to assert that all documents have been ingested successfully.

Hope that helps.

Daniel

1 Like