Shards Distribution in Multiple disks

Hi

I would like to know the shards allocation in between the Disks in the ES Nodes . I have the cluster which have 6 data nodes - Each node have a 4 disks of
/data/disk001, /data/disk002, /data/disk003, /data/disk004 (Spinning ) each 512 GB.

When I am creating index with 6 shards, Which disk the shards will allocated ,Is there any formula/calculation ?

How the indexing performance will be in this node?

If the shards are allocated into each disk, can we increase the no of shards to 6 ES Nodes *
4 Disks = 24 shards for each index to make indexing performance?

Two main default rules determine placement:

  1. Replicas won't be placed on the same node as the primary shard, even if it's on a different disk
  2. Disks full enough to trigger the low/high watermarks

Easiest thing to do is just try it and see where things land:

PUT mohankumar_test
{
    "settings" : {
        "index" : {
            "number_of_shards" : 24, 
            "number_of_replicas" : 1 
        }
    }
}

As for performance, it seems likely that you'll get better indexing performance using 24 shards than, say, 12 shards. Use Rally to benchmark a few different configurations so you have better data to guide your decisions.

1 Like

Thanks loren,

I would like to know the allocation of disk utilization

sda
sdb
sdc
sdd

In the above 4 disks , which disk involve to indexing with shards, I hope the single shard not spread with other 4 disks. A single shard is mapped with individual disk. SO what will be the calculation of assigning shards to the disks

Correct, a single shard will only live on one disk. As for which disks get the allocation, my understanding/experience is that the allocator attempts to balance the shard count, not the disk usage. So let's say you have:

  1. sda, 70% full, 10 shards
  2. sdb, 10% full, 12 shards
  3. sdc, 10% full, 12 shards
  4. sdd, 10% full, 12 shards

If you create a new single shard index, you might think it would go to one of the 3 nearly empty disks. Instead it'll pile on to the 70% full disk, because it only has 10 shards.

Now, once that 70% full disk becomes 85% full, the default disk threshold watermark will kick in and it won't get assigned new shards.

1 Like

Thanks for sharing your experience loren

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.