Need a tip on disks partitioning

Hello,

I am setting up a multi-node ES cluster. Each node has 6 SATA HDDs.
What is the recommended disk partitioning for ES?

  1. create RAID5 and use it as a single path.data?
  2. create RAID0 and use it as a single path.data (and rely on replicas for redundancy)?
  3. make 6 partitions and list them all in path.data? Will ES detect that all these disks reside on different hardware (on separate spindels) and spread disk load among them?
  4. something else?

Thanks!

6 disks is enough that it is probably worth playing with it a bit, maybe benchmarking loads and searches. My only extra advice is to have a couple of spares on hand in case of failure.

This is fairly reasonable.

6 disks in raid 0 sounds like a nightmare from a mean time between failure perspective to me.

This is also reasonable. ES doesn't detect anything about these disks, but it will spread the load among them. Keep in mind, though, that path.data is designed around redundancy in a particular way: one shard lives entirely in one path.data path. So if you only have one shard on the node you'll only use one path.data. The idea here is that if you lose a disk you want to lose only the shards on it. If a shard were spread out on multiple path.data paths and you lose a disk then you'd lose all the shards that have files on that disk. Either way you can restore from Elasticsearch's replica copies, but you'd have to restore more if it were done the other way.

Other options are to RAID0 the disks in pairs and and list each pair in path.data. You could RAID0 three disks together but that starts to make me uncomfortable. RAID0 pairs are going to have better per shard performance then 6 data.paths. RAID0 triplets is similar, but trades more frequent failures.

You could also RAID10 but I don't think that makes a ton of sense with Elasticsearch.

Honestly, if you don't want to spend time benchmarking I'd go with RAID0 pairs or RAID5, whatever you are more comfortable with.

Finally I should mention RAID6 - it is a useful thing to think about if you bought, say, 12 disks per node.

Also: you may be quite disk heavy here. If you intend to have these nodes be queries frequently it might make sense to have fewer disks with more nodes. You also might want to look at hot/warm architectures, depending on your goal here.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.