Avoiding huge data replication in case of failure

666dexlesshermit67 · October 16, 2017, 2:59pm

Hi,
I'm using elastic in a production environment, and I thought of the following scenario:
Let's say I have N servers that form a single cluster. I have N-1 replicas defined (overall each servers holds N-1 replica shards for every primary shard).
Each server has a group of disks defined in RAID 0 for improved performance, and the elastic get's that RAID 0 as a single block device.
If one of the disks in the i-th server is corrupted, I can basically throw the whole disk, since nothing can guarantee the integrity on that RAID 0 in the i-th server's storage anymore. Then, I would probably shut down the server, install a new disk on that server, and let elastic synchronize the server with the new disk into the cluster.
Assuming that the corrupted disk held M Terabyte of data, that synchronization would cost me a huge amount of unwanted network load.
Hence, I would expect the immediate solution would be to create some fail tolerant RAID in every server (e.g. RAID 50), but for some reason, I saw an explicit recommendation in the elastic docs saying:
"Using RAID 0 is an effective way to increase disk speed, for both spinning disks and SSD. There is no need to use mirroring or parity variants of RAID, since high availability is built into Elasticsearch via replicas."

Igor_Motov · October 22, 2017, 12:05pm

You can also mount your disks separately (instead of putting them all into a RAID) and use multiple data path so elasticsearch can allocated different shards on different disks. Depending on your load, it might not be as efficient and fast as RAID 0, but in this case only shards that were on the disk that died and shards that you were actively indexing into will need to be copied in case of the failure.

system · November 19, 2017, 12:05pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Hybrid RAID 0 and Mutliple Data Paths Elasticsearch	10	1699	July 5, 2017
Server Spec Thoughts Elasticsearch	10	2454	July 5, 2017
Using RAID 0 vs multiple data paths after commit #10461 Elasticsearch	6	3630	July 6, 2017
Raid 0 with replication vs Raid 1 with no replication Elasticsearch	2	890	March 12, 2018
Questions of setting up elasticsearch server with 20 SSD on it Elasticsearch	4	1419	July 5, 2017

Avoiding huge data replication in case of failure

Related topics