Moving our three Elasticsearch nodes over to physical hosts and wanted to run things buy you all before purchasing.
Looking at three Dell R730s w/ dual Xeon E5-2630 v3 2.4GHz, 128 GB of memory, 2 250GB OS drives and 5 1TB (RAID 5) data drives. Raid controller will be a PERC H730P.
Any reason why we should skip RAID5/6 and go for multiple path.data? I am about to persuade my supervisor to do so too. Based on this http://www.raid-calculator.com/default.aspx, RAID5/6 boosts read speed while write speed remains the same.
RAID5 also suffers from write holes, which can cause lost data without you knowing.
If you use multiple path.data it is essentially RAID0, striping. But if you did RAID0 with an array you lose all data ont he array, if that happens with the path.data use then all you lose is data on that disk. But if you have replicas then you should be safe from that anyway, so you just replace the disk and move on.
You're worried enough about your data to not use RAID0, but not enough to get coverage on the data store itself? I'm just being facetious, but you get my point.
I'd still highly recommend letting ES worry about the redundancy angle, but I understand the realities of situations
Think twice. You have a backup, do you? You can recover from backup?
Then think about your data. You have an enterprise data system to pull data from, where data is maintained. You can always rebuild the Elasticsearch index from that source which is physically separated from the ES cluster. If not, you do backups.
Then think about ES server redundancy. You use replica. If you use replica, then you have at least one whole server in redundancy mode.
Then think about the disks. RAID 5/6 can recover in the background, with spare disks and so on. This kills the performance of the server but in ES cluster mode, it kills the whole cluster. I repeat: the whole cluster performance will bog down when RAID5/6 is recovering. You will need to take the system offline. Just because of a single broken disk in a redundant server!
Here is my suggestion. Use RAID 0 on your copy of enterprise data with replica level 1 or higher. Sleep quiet. Let a whole server go down if a disk fails, it does not matter. Test your replica levels. Decommission a broken server, repair the disk, and bring ES back after repair. See the difference?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.