I would like to know the shards allocation in between the Disks in the ES Nodes . I have the cluster which have 6 data nodes - Each node have a 4 disks of
/data/disk001, /data/disk002, /data/disk003, /data/disk004 (Spinning ) each 512 GB.
When I am creating index with 6 shards, Which disk the shards will allocated ,Is there any formula/calculation ?
How the indexing performance will be in this node?
If the shards are allocated into each disk, can we increase the no of shards to 6 ES Nodes *
4 Disks = 24 shards for each index to make indexing performance?
As for performance, it seems likely that you'll get better indexing performance using 24 shards than, say, 12 shards. Use Rally to benchmark a few different configurations so you have better data to guide your decisions.
I would like to know the allocation of disk utilization
sda
sdb
sdc
sdd
In the above 4 disks , which disk involve to indexing with shards, I hope the single shard not spread with other 4 disks. A single shard is mapped with individual disk. SO what will be the calculation of assigning shards to the disks
Correct, a single shard will only live on one disk. As for which disks get the allocation, my understanding/experience is that the allocator attempts to balance the shard count, not the disk usage. So let's say you have:
sda, 70% full, 10 shards
sdb, 10% full, 12 shards
sdc, 10% full, 12 shards
sdd, 10% full, 12 shards
If you create a new single shard index, you might think it would go to one of the 3 nearly empty disks. Instead it'll pile on to the 70% full disk, because it only has 10 shards.
Now, once that 70% full disk becomes 85% full, the default disk threshold watermark will kick in and it won't get assigned new shards.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.