Correct number of shards for 5.3 TB indices

We have indices with total size of 5.3 TB. We have one replica for this indices. So, in total 10.6 TB. We are planning to start re-index process to make the shards smaller (currently 1TB per shard). We are planning to create new indices with 200 shards. This will make the shard size 27 GB (approximately). We have 33 node cluster and few small indices share this cluster. Please let me know if there is any disadvantages of having 200 shards.

That's a reasonable size. Why is the index so large though?

Because, this indices is not time partitioned. We are planning to implement it in future.

I read that there is overhead in having more shards. Will having 200 shards (primary) + 200 (replica) for total no. of 33 nodes affect the performance negatively?

We are at 1.7 (java) and 1.7.2 (elasticsearch).

You'd have to test it to be sure. But given it's only 12 shards per node, I wouldn't imagine a big problem.

I will try to get this tested. What if we have 60 shards (p) + 60 shards (r)? This will reduce the no of shards per node ratio to 1:4. Is there any advantage of this proposal to 200 + 200 (we discussed it earlier)?

Generally, more shards = higher indexing speeds, less shards = faster query speeds.

But it's all stuff you need to test as more/less shards is relative.

2 Likes

ok. Thanks for your help on this issue.

Is there a quick way find out if an indices is bound by read (query) or write (indexing)?

Use some kind of monitoring.

Thanks again.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.