What is the optimal shard size?

Ella_conan · July 28, 2025, 12:11pm

Hello,

I have an index that contains 1 terabyte of events. The index currently has 1 shard and 1 replica.

I want to change the index settings and increase both the number of shards and replicas, but I don't know the optimal size for a single shard. I want the queries to be very fast.

leandrojmp · July 28, 2025, 12:49pm

Between 10 GB and 50 GB.

Check this documentation.

Christian_Dahlqvist · July 28, 2025, 12:50pm

The ideal shard size will depend on a number of factors, so there is not necessarily any single shard size that is optimal across the board. Some of these factors are:

Type and complexity of data and mappings
Type of queries used and size of result set returned
Number of concurrent queries
Level of concurrent indexing
Cluster size
Type of hardware used

It would help if you could shed some light on these. It would also be useful if you could tell us which version of Elasticsearch you are using.

Ella_conan · July 28, 2025, 2:55pm

Alright, I’ll answer as accurately as possible.

The data and mappings can be considered complex—extremely complex, in fact—due to heavy use of nested structures and a very large number of fields (more than 1,000).
The data itself is a mixed type, including text, numbers, and a wide range of special characters.

Query patterns in use:

exists
wildcard — used heavily; almost every query includes at least one wildcard
term
match
match_all
must

The result set size for some queries (e.g., last 7 days of data) can reach up to 666 million documents.

Query concurrency:

The number of concurrent queries varies; currently up to 19, sometimes more, sometimes fewer.

Indexing activity:

3 active indexing processes

Cluster details:

Total data size: 11 TB
Number of nodes: 5

Hardware specs per node:

64 GB RAM
32 CPU cores
One node has 101 GB RAM
HDD storage on all nodes

Elasticsearch version:

8.17.0

Christian_Dahlqvist · July 28, 2025, 3:01pm

That is clearly not a standard use case so I believe you will need to do some testing and benchmarking in order to determine the ideal shard size. I would use the size range Leandro mentioned as a starting point.

It seems that you are using features and query types, e.g. wildcard, that are known to be heavy and slow. Combine this with large result sets and I doubt you will be able to get queries very fast. As you have large result sets, make sure you have as fast storage as possible, ideally fast local SSDs.

This is likely going to be your first/main bottleneck. HDDs does not sound at all suitable for your use case.

Ella_conan · July 28, 2025, 3:05pm

This means that if I have 1 terabyte of data in the index, I will need 21 shards.
And I want to have 1 replica for each shard.

Would this be too much, or is it reasonable?

What I’m thinking is: if I have 100 indices and apply the same shard count to each,
would that lead to oversharding, or is it considered normal?

stephenb · July 28, 2025, 3:43pm

Ouch

What Kind of data is this? Is it a time series (does not necessarily mean logs or metrics, but it involves time as a primary component and/or a common query filter)? If so, I would break up the index into multiple indices. I would probably do that even if it is not time series for better manageability...

If not are they there other low cardinality dimension you could break up by that an use constant_keywords that could also greatly improve performance...

Approaches could greatly improve manageability and query performance if done / configured correctly

If this is a long term plan I would start from the ground up...

Ella_conan · July 28, 2025, 5:46pm

Understood, thank you.

I have another question:

What is the optimal size for a single node?

For example, I’m thinking: if I have five nodes, each with 15 TB of storage,
I’m considering limiting the data on each node to 10 TB or less.

Is it better to have large storage per node, or smaller storage per node?

Topic		Replies	Views
Optimization of elastic search Elasticsearch	3	336	June 28, 2019
Design indexes with big data Elasticsearch	16	2279	July 31, 2020
Index optimum size Elasticsearch Cluster Elasticsearch	4	459	June 19, 2018
Shard size / Index number / server count and performance Elasticsearch	4	1411	July 6, 2017
Optimal shards: 1 or number of nodes? Considerations Elasticsearch	10	5730	August 29, 2018

What is the optimal shard size?

Query patterns in use:

Query concurrency:

Indexing activity:

Cluster details:

Hardware specs per node:

Elasticsearch version:

Related topics