Shards Distribution in Multiple disks

moni15moni · February 26, 2018, 1:07pm

Hi

I would like to know the shards allocation in between the Disks in the ES Nodes . I have the cluster which have 6 data nodes - Each node have a 4 disks of
/data/disk001, /data/disk002, /data/disk003, /data/disk004 (Spinning ) each 512 GB.

When I am creating index with 6 shards, Which disk the shards will allocated ,Is there any formula/calculation ?

How the indexing performance will be in this node?

If the shards are allocated into each disk, can we increase the no of shards to 6 ES Nodes *
4 Disks = 24 shards for each index to make indexing performance?

loren · February 26, 2018, 8:46pm

Two main default rules determine placement:

Replicas won't be placed on the same node as the primary shard, even if it's on a different disk
Disks full enough to trigger the low/high watermarks

Easiest thing to do is just try it and see where things land:

PUT mohankumar_test
{
    "settings" : {
        "index" : {
            "number_of_shards" : 24, 
            "number_of_replicas" : 1 
        }
    }
}

As for performance, it seems likely that you'll get better indexing performance using 24 shards than, say, 12 shards. Use Rally to benchmark a few different configurations so you have better data to guide your decisions.

moni15moni · February 27, 2018, 6:48am

Thanks loren,

I would like to know the allocation of disk utilization

sda
sdb
sdc
sdd

In the above 4 disks , which disk involve to indexing with shards, I hope the single shard not spread with other 4 disks. A single shard is mapped with individual disk. SO what will be the calculation of assigning shards to the disks

loren · February 27, 2018, 5:41pm

Correct, a single shard will only live on one disk. As for which disks get the allocation, my understanding/experience is that the allocator attempts to balance the shard count, not the disk usage. So let's say you have:

sda, 70% full, 10 shards
sdb, 10% full, 12 shards
sdc, 10% full, 12 shards
sdd, 10% full, 12 shards

If you create a new single shard index, you might think it would go to one of the 3 nearly empty disks. Instead it'll pile on to the 70% full disk, because it only has 10 shards.

Now, once that 70% full disk becomes 85% full, the default disk threshold watermark will kick in and it won't get assigned new shards.

moni15moni · February 27, 2018, 5:44pm

Thanks for sharing your experience loren

system · March 27, 2018, 5:44pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to allocate ES data shards among 20 es data nodes effectively? Elasticsearch	5	466	October 23, 2017
Unexpected and uneven per-index shard allocation Elasticsearch	8	2033	May 4, 2017
All shards being allocated on the same node Elasticsearch	7	3993	July 5, 2017
Sharding and Performance Elasticsearch	1	310	August 29, 2018
ES allocates primary shards on the same data node Elasticsearch	19	4588	August 8, 2018

Shards Distribution in Multiple disks

Related topics