Does anyone have experience with using Ceph as storage for Elasticsearch?
I am looking for a way to make the storage part more fault tolerant on OS level. I know you can use multiple replicas for this, but I am investigating a way to prevent shard failures because of failing disks or raid controllers.
Here goes... Currently we have servers with 7 disks of ES data. Each datanode gets a data.path list of all 7 local disks. With ES 1.7 data for a single shard is spread over all disks using 'least used'. With 2.0 data for a single shard will be put on 1 disk, not all 7. Great for resiliency, because it solves 'partially failed shards' because 1 of 7 disks died.
But ... having data for 1 shard all on 1 disk, makes that disk hot while indexing. Much hotter than with 1.7, because that version spreads write ops over all disks. So with ES 2.0 we are thinking spinning disks will not work anymore.
And so we are investigating alternatives, except throwing in flash storage...
I don't really run clusters like this anymore unfortunately, but I do know ES a bit
I do think that you're over worried though, CEPH will likely kill performance more than hitting a single disk (don't forget you need to hit the network with CEPH), and without testing it's hard to say that it'll kill a disk.
I cannot add new disks to the existing servers, so RAID will require me to replace all disks with larger ones (increasing total seek latency) or add more servers. RAID is on the table, but has pros and cons.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.