How does one decide the number of hosts ? I understand the answer for such questions is always "It Depends" but is there a strategy to that I can use to determine this number ? I went through this document - https://www.elastic.co/guide/en/elasticsearch/guide/current/capacity-planning.html and it recommends to do a test to determine the number of primary shards (index size / max shard size once I have the number what would be a good way to find the number of shards that can be distributed to a host ? Ideally 1 shard per host would be great but that doesn't seem reasonable given my index size. I presume the performance won't be the same when multiple shards are located on a machine. Any pointers would be helpful.
How big is your index size?
Index size is around 1.5 TB, we have around 4B documents with 200 shards. We have 10 data nodes and 3 master nodes. Each data node has 48 cores, 144GB of ram with SSD.
So 7.5Gb per shard?
I'd increase that a fair bit, maybe by 5 times.