Chaos engineering in Elasticsearch

How can i break Elasticsearch in controlled manner, so that i can test its resiliency in my environment. I do understand that we could tamper with the external factors like memory , network, cpu etc, but i was just wondering if something is built into/along with elasticsearch as a plugin for development, which help us in testing this. If nothing of this sort exists, how can i create one such thing? possibly in a form i could contribute back to community so that others can re-use it??

Any pointers are appreciated.

The class org.elasticsearch.discovery.AbstractDisruptionTestCase is the base class for a number of test suites that build a cluster of Elasticsearch nodes (running in a single JVM) and test its behaviour under various network disruptions.

There are other tests in the test suite that apply other kinds of disruption too. A good starting point is to look at the implementations of org.elasticsearch.test.disruption.ServiceDisruptionScheme and see where they are used.

We have also been running some tests, particularly benchmarks using Rally, in low-memory environments, and improving how Elasticsearch deals with memory pressure as a result.

I have also done some fruitful experimentation with a cluster running on a single machine in Docker containers, applying network disruptions using iptables rules.

Mike McCandless also wrote up some of the testing he did on Lucene's safety in the event of power loss.

I hope these are useful pointers.

2 Likes

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.