Storage setup questions - JBOD or not? - node availability on disk failure


(Bruno Lavoie) #1

Hello,

I know, this kind of questions has already been asked, but the information that I've found is scattered and almost old.

We're on buying our nodes (5 to begin) and we don't want to make bad choices regarding the storage approach. Our usage context is usual: indexing a huge volume and high velocity logs with the ELK stack.

At a first sight, with the fact that the replicas provides replication, it appears a waste of space and money to have a RAID 1. As it's recommended with HDFS setups, i'm thinking that using a JBOD array with 1 mount point per drive and listing them in path.data parameter would be a good idea. I know that the usage patterns between Hadoop and ES aren't exactly the same, but let me finish... :slight_smile:

As of ES 2.0, path.data is better in the way the shards aren't spread between specified directories. That mean that if we lose 1 disk, only a few amount of shard will be lost and relocated across the cluster. Very interesting, a good point in favor to JBOD.

But... To begin with our ELK setup, per default, each index (daily) will have 5 shards (1 per node). Excluding the replicas, that means that for one very hot indice only 5 disk will be on high demand, while others will mostly underutilized because the current day is highly utilized for monitoring.

Giving the last two points:

  • each shard is segregated only on one of the path.data element - Good.
  • underutilized disk throughput by pure JBOD utilization - Bad.

I think that an hybrid approach would be a nice tradeoff: create a certain amount of RAID 0 from availlable disks. As an example with 20 disks, we can create 5 * RAID 0 array with 4 spindles each. This gives a good balance between resiliency and performance.

Is it a good approach?

Another important point for us is: no matter how are configured our disks, does an ES node is misbehaving when a mount point in the path.data fails? Is ES simply mark the dead mount point as unusable? I hope so... Does it manages well hot swapping of disks?

It the case of a disk failure, if the node is misbehaving and becomes down, it will make the cluster yellow and a lot of traffic to make it back green. With only 5 highly loaded nodes, the lost of one node can have a huge impact on the remaining nodes. With more nodes, the impact of one node leaving will be less apparent.

For starting, licencing cost for shield and watcher make us sticking at a maximum of 5 nodes.

Then, to play it safe, should we rather use a RAID 1+0 array with a single mount point?
This makes a loss of space, increase of storage costs, but more stable cluster. Other than performance, this discredit the use of replicas.

I know, it always depends and we must trade-off between costs, stability, ease of operation, performance, space per node vs number of nodes, etc.

Thanks
Bruno Lavoie


(Mark Walkom) #2

Yep.

It'll be unable to write to the shards on that mount/path, obviously, but it'll keep going. I'm not sure if it deals with hot swapping.


(Bruno Lavoie) #3

Fine, found nothing in docs about that...
Maybe the only way is to test it...

Thanks


(system) #4