Too big a shard vs Too many shards

Hi Elastic Team,

I am in a bit of dilemma where currently I have limited amount of nodes in the cluster however the amount of data is fairly large so I need to decide whether I should have more shards in the whole cluster or have each shard hold more data.

I understand that having too many shards and making shards too large both affect search speed negatively however which one in general is a better choice?

To be more specific, if I go with the option to have a lot of shards I will end up with around 60 shards per node in my cluster, each shard containing 50GB which is a rough limit advised from various sources on the web.

And if I go with say 5 shards per node each shard will hold roughly 600GB data. If I try to go nearer to the ideal figure of 1 shard per node the amount of data per node goes up further.

Or should I choose a number somewhere in between? Maybe have each shard hold 200GB and have 15 shards per node is better than going for the extremes above?

Many thanks for your help.


1 Like

I would recommend keeping shard size below 50GB. It is not only about search speed, but also the fact that it is easier to move smaller shards around if/when a node fails and these can also more evenly be redistributed.

Thanks Chris, I forgot to mention that point as well, indeed bigger shards will cause hogs in the network when shards are moved around. So from your comment I'd go with smaller shards, unless there are other comments to consider.

The important aspect is the number of primary shards per index, not node per se. With index size of 500G you'll probably need at least 10 primary shards, or you could try to split up the large index into several smaller based on time, type or some other document property available at indexing time. That way you'll have several smaller indices on each node which makes indexing, maintenance and possibly also searching much lighter.

I currently run a 4-node setup with 760 mill docs (495G) spread over six time based indices which each has the default (5) number of primary shards - thus 30 primary shards in all. This setup handles well both re-allocation of shards, high volume indexing (I rebuilt the cluster indices in 24 hours by indexing on average 3200+ docs per second) and searching (have built word clouds in Kibana across all six indices).

me@workhorse:~$ spyes -host esp04 -indexinfo -total -index "pulse-2015-7(col|q[1-4]|[0-1][0-9])-v2"
Master node: esp01
Cluster: pulse status: green
Total: docs: 762.61M deleted: 1.74M disksize: 495.48G
pulse-2015_col-v2 is open, pri_shards: 5, doc_count: 356.645.324, del_count: 209.534, disk_usage: 242.24G
pulse-2016-q1-v2 is open, pri_shards: 5, doc_count: 74.368.037, del_count: 88.011, disk_usage: 44.03G
pulse-2016-q2-v2 is open, pri_shards: 5, doc_count: 96.240.821, del_count: 385.607, disk_usage: 52.96G
pulse-2016-q3-v2 is open, pri_shards: 5, doc_count: 100.634.937, del_count: 470.011, disk_usage: 54.14G
pulse-2016-q4-v2 is open, pri_shards: 5, doc_count: 106.147.357, del_count: 389.606, disk_usage: 61.90G
pulse-2017-01-v2 is open, pri_shards: 5, doc_count: 35.343.659, del_count: 52.818, disk_usage: 21.88G
pulse-2017-02-v2 is open, pri_shards: 5, doc_count: 30.276.545, del_count: 228.316, disk_usage: 18.33G

Hi Bernt,

Thanks for your comment.

Interesting though, but from what I understand the number of indexes does not physically affect anything if the total number of shards stay the same and what it really boils down to is the shards layout or in other words how many shards are being queried and across how many nodes simultaneously. In other words, to me, indexes is more of a "management thing" rather than a "performance thing".

Quoting the definitive guide:

Searching 1 index of 50 shards is exactly equivalent to searching 50 indices with 1 shard each: both search requests hit 50 shards.

So for your 4 node setup, let's assume as an example your search query results in requests hitting 5 shards on each node, the performance would be the same whether those 5 shards are on 1 index or on 5 indices.

Unless I'm misunderstanding something.

From what I gathered from different advises on the web, in an Utopian cluster, 1 node will have 1 shard (either primary or replica) holding no more than 50GB of data (for ease of moving shards around) and then as the total amount of data goes up we just increase the number of nodes in the cluster (in reality only up to a certain limit due to full mesh network constraints - I was advised the number of nodes per cluster should not exceed 200 nodes).

In reality we can't practically have the above so each user has to work out the best compromise for them.

In my case, my strategy is I'm trying to find out total number of shards first based on some set out constraints: for eg, each shard should hold no more than 50GB data (or 1 shard per node, etc). I know my forecasted total amount of data expected hence I can estimate the total expected number of shards in the cluster. Then I know my data is time based so I will have daily indices and I plan to keep for eg 30 days retention so I will have 30 indices in the whole cluster. Then I can calculate the number of shards per index to be: total number of shards in the whole cluster divided by 30 (I will then use this value as a guide to set the number of primaries + replicas in the index template).


Yes, in many ways an index is a "management thing" and it's certainly true, as you point out, that searching 1 index of 50 shards is exactly equivalent to searching 50 indices with 1 shard each. However there are aspects with having just one big index that will affect both performance and management of the cluster.

Data is always indexed into primary shards and the number of primary shards is written in stone once the index has been created. It can never be changed for that index. This means you can't grow your cluster dynamically, by increasing the number of primary shards (and nodes) later on. If you start out with 30 primary shards it won't help to go beyond 30 nodes in the cluster because you won't get any more primary shards (the surplus nodes may improve searches though since they can get replica shards). Thus slowly the number of documents per shard will increase beyond the 50G limit and there isn't much you can do, except run expensive deletes in the index or perform a full reindexing to a new index configured with more primary shards.

From what I've experienced there is much good to be said for having several smaller indices rather than one large:

  • It makes it easier to delete or retire old data when not in use anymore. Rather than running expensive deletes in a big index one can simple delete or retire the entire old index in one operation.

  • You can have different number of replicas for the different indices, with more replicas for new data to allow better query performance (queries are run against both primary and replica shards) and fewer for old data (in order to save disc space). While the number of primary shards are written in stone you can dynamically change the number of replicas in an index.

  • If for some reason you need to reindex its easier and probably safer to reindex one small index at a time rather than one big. For instance, if you're already using more than 50% of the available disc on the nodes the reindexing may run out of disc space before it can finish.

  • You can move an index to specific nodes allowing say the less used indices to reside on fewer or slower nodes than the hotter indices. That way you can balance your cluster, boosting its handling of the hottest data. See the hot warm architecture here

So, both from a performance perspective, as the amount of data grows and you need more primary shards, and from a management perspective I would still recommend using several smaller indices rather than one large.

Good luck!


Many thanks Bernt for the detail comment, I've gotten very good insight from it.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.