Too big a shard vs Too many shards

Bernt_Rostad · February 21, 2017, 4:44pm

Yes, in many ways an index is a "management thing" and it's certainly true, as you point out, that searching 1 index of 50 shards is exactly equivalent to searching 50 indices with 1 shard each. However there are aspects with having just one big index that will affect both performance and management of the cluster.

Data is always indexed into primary shards and the number of primary shards is written in stone once the index has been created. It can never be changed for that index. This means you can't grow your cluster dynamically, by increasing the number of primary shards (and nodes) later on. If you start out with 30 primary shards it won't help to go beyond 30 nodes in the cluster because you won't get any more primary shards (the surplus nodes may improve searches though since they can get replica shards). Thus slowly the number of documents per shard will increase beyond the 50G limit and there isn't much you can do, except run expensive deletes in the index or perform a full reindexing to a new index configured with more primary shards.

From what I've experienced there is much good to be said for having several smaller indices rather than one large:

It makes it easier to delete or retire old data when not in use anymore. Rather than running expensive deletes in a big index one can simple delete or retire the entire old index in one operation.
You can have different number of replicas for the different indices, with more replicas for new data to allow better query performance (queries are run against both primary and replica shards) and fewer for old data (in order to save disc space). While the number of primary shards are written in stone you can dynamically change the number of replicas in an index.
If for some reason you need to reindex its easier and probably safer to reindex one small index at a time rather than one big. For instance, if you're already using more than 50% of the available disc on the nodes the reindexing may run out of disc space before it can finish.
You can move an index to specific nodes allowing say the less used indices to reside on fewer or slower nodes than the hotter indices. That way you can balance your cluster, boosting its handling of the hottest data. See the hot warm architecture here https://www.elastic.co/blog/hot-warm-architecture-in-elasticsearch-5-x

So, both from a performance perspective, as the amount of data grows and you need more primary shards, and from a management perspective I would still recommend using several smaller indices rather than one large.

Good luck!

Topic		Replies	Views
Numerous small shards or Few big shards? Elasticsearch	4	442	April 20, 2018
How many shards should I create in terms of an index across 20 data node? Elasticsearch	6	3777	July 21, 2017
Optimal shards: 1 or number of nodes? Considerations Elasticsearch	10	5626	August 29, 2018
Elasticsearch performance tuning doubts Elasticsearch	8	1002	June 30, 2019
How many shards should I put in a node？ Elasticsearch	7	1453	July 5, 2017

Too big a shard vs Too many shards

Related topics