Storage Best Practices

As part of core operations class, i see a slide with the below details .

path.data vs. RAID0
‒ RAID0 will be slightly more performant
‒ path.data will allow a node to continue to function
‒ e.g. a machine with 4 2TB drives and at most only 25% (2TB) of
data will need to relocate

Can you explain the above in detail or with some examples so that I can understand it well

If you point Elasticsearch to multiple path.data mount paths and if one of those paths disappears (ie the disk fails) then you lose that single disk (25%).

If you have RAID0 and a disk fails then the entire array and all data on it are lost.

How does elasticsearch handle the case where IO just freezes to that single disk and the path doesn't disappear (cached entries can be read, some writes may succeed until they are fsynced etc)?

It'd be better if you created another thread in the #elasticsearch category. We're trying to keep this one to specific questions about our online training :slight_smile:

Thanks. Say i have a cluster with 3 nodes and a single index with 3 primary and one replica. When we configure multi paths, i believe index is stripped into multi paths but not shards. i.e one complete shard will remain in one path.

Say i have configured multi paths ex path.data : /ds1, /ds2 say one the disks failed(/ds1) in one of the nodes. Now will the shard reallocation happen still i.e make replicas in other nodes primary and create missing replicas( shards that were lost due to disk failure) or does this shard reallocation happen only when node fails.

Shard reallocation happens on a shard level, not a node. So if the shard on that bad disk is lost then it will recreate one, it does not wait for the host to also drop off.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.