Hi there,
I have an Elastic Cloud cluster running v7.10.2 with three data nodes, three dedicated masters and three coordinates nodes.
One index is currently 150GB, spanning three primary shards (one per data node) with a replication factor of one. This index will soon exceed the recommended practice to keep shard sizes between 10GB and 65GB, and is soon due to increase in size by 10x.
Data in the index is mutable with bursty read/write patterns, and is not currently eligible to be split to separate indexes or defer to e.g. warm storage. ILM or Data Streams are not in use given the non-timeseries and mutable nature.
I'm unclear on the recommended practice with upcoming data growth. Is it advisable to simply increase the primary shard count to keep the per-shard size down to recommended levels, even though the number of data nodes will not increase?
Assuming CPU/memory and general resources on the nodes can accommodate, and the nodes are comfortably within the maximum total shard recommendations, are there any disadvantages of an increasing number of primary shards, for the same index, residing on the same node?
Should I be concerned with a future 1TB index spanning ~20 primary shards (with one replica), on just three data nodes?
Note: I would generally like to increase the data node count in time to distribute load (in particular for bursts), but the baseline load on these nodes is low and Elastic Cloud service has a strange limit where by you cannot increase the data node count past three (one per availability zone) until you first pay for the largest memory size instances available:
- Note that to increase the number of nodes assigned to an instance configuration you must first scale up to the maximum RAM for that instance type. For example, if the maximum value on the RAM per Node slider for your Elasticsearch data node is 64GB, you need to scale up to that value before you can add additional nodes.
Source: Customize Your Deployment