Large shard size

caseydm · November 5, 2021, 12:14am

I am setting up a new Elasticsearch cluster that will hold around 300 million records in a single index to start. The documents are academic paper data. Right now I have two nodes, so I set my shards as 2 primary and 1 replica. But at 75 million records I am already at 55gb per shard with 4 shards total. This caused a warning in Elasticsearch for 'large shard size'.

From what I have read I should resolve this. But what is the best of these options?

Simply create more primary shards but keep the same number of nodes (I know I would have to re-index).
Increase number of nodes and add replicas (could be very expensive and somewhat wasteful)
Split the index using publication year or something like that. This would then lead to ~300 indexes rather than the one.

I need to search across all the data for most queries. Any suggestions?

warkolm · November 5, 2021, 12:17am

Given this is time based data, even if it's relatively static and not high velocity, I'd put things into yearly indices.

You can have more than 1 primary shard per host too

caseydm · November 5, 2021, 2:49am

Ok great I'm giving that a shot to see how it performs. One more question - since my use case is search heavy, what do you think of the idea of having 1 primary shard and 1 replica shard to begin with, then add replicas if I decide to increase my nodes? That way I reduce the overhead of having multiple primary shards.

warkolm · November 6, 2021, 4:57am

I guess that depends on how large the primaries end up, but it's a solid place to start.

system · December 4, 2021, 4:58am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Trying to optimize Elasticsearch cluster Elasticsearch	3	980	February 20, 2017
Correct number of shards for 5.3 TB indices Elasticsearch	10	2169	May 18, 2017
Too big a shard vs Too many shards Elasticsearch	7	37483	March 22, 2017
Max shard size for a very large single index Elasticsearch	5	1725	April 7, 2020
How many shards should I create in terms of an index across 20 data node? Elasticsearch	6	3746	July 21, 2017

Large shard size

Related topics