Shards vs Indexes: which when for chunkable data?

IanRC72 · July 31, 2019, 8:17pm

Hopefully a different take on an old question...

I'm wondering if there's a known formula for choosing when to have many indexes with, say, one shard each vs. one index with many shards, or some point in between, for the case where it's trivial to determine which node/shard on which to index/query data (i.e., it's chunkable).

Specifically, my data is routed via a (spatial + data type) hash, which is easy to divide across indexes and/or shards.

Is there any technical advantage/disadvantage to, say, 1-index-64-shards vs. 64-indexes-1-shard? AFAIK, increasing the node count would auto redistribute the shards either way.

This would include scaling, memory, and performance considerations.

Bernt_Rostad · August 1, 2019, 11:02am

Not in my experience. And down at the Lucene level the two cases are pretty much the same: An Elastic shard is simply a Lucene index. So whether you have 46 shards in one big Elastic index or 64 Elastic indices with 1 shard each you still end up with 64 Lucene indices. There could be some differences in cluster state, but I'm not sure what.

For me, when designing cluster indices, I first focus on the 20-40 GB shard size rule. Let's say I need daily indices to store around 30 GB data, then I put 1 shard in its index template. While if I need weekly indices of 200 GB data I'll use between 5 and 7 primary shards in the template.

Another thing to consider is the maximum number of shards per node, which can be computed from the 20 shards per GB of heap space rule. If I assign 8 GB of Java Heap Space to each data node I need to make sure that each node has fewer than 8 * 20 = 160 shards (primary + replica). If I get close to this number I simply add more data nodes to spread the shards more thinly (or I could increase the heap space, but that would add more work load to each data node so I prefer the first solution).

For more index sharding tips, please read How many shards should I have in my Elasticsearch cluster?.

system · August 29, 2019, 11:02am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How many indices could an Elasticsearch cluster include Elasticsearch	4	1798	July 19, 2017
When do you need more then 1 shard? Elasticsearch	12	1853	July 6, 2017
Better to have more indices or more shards per index? Elasticsearch	3	256	August 15, 2022
Optimal shards: 1 or number of nodes? Considerations Elasticsearch	10	5219	August 29, 2018
Correct number of shards for 5.3 TB indices Elasticsearch	10	2152	May 18, 2017

Shards vs Indexes: which when for chunkable data?

Related topics