How many shards should I choose?

Hi, there is a problem on our project and I am trying to understand why it is happening.

One of the reasons that I take into account is how many shards we have installed at the index size, the index can weigh an average of 200GB, with the number of 1 shard. How can this affect performance?

Error what we have:
[parent] Data too large, data for [indices:data/write/bulk[s]] would be [27047392520/25.1gb], which is larger than the limit of [26521423052/24.6gb], real usage: [27047391936/25.1gb], new bytes reserved: [584/584b], usages [request=0/0b, fielddata=0/0b, in_flight_requests=584/584b, model_inference=0/0b, accounting=216717508/206.6mb]

Welcome!

You normally don't want to exceed 50gb per shard.

Please read

Welcome to the forum @Zoree

It's always helpful to include as much as possible info on your setup, e.g. how many nodes, what hardware spec of what resources allocated to the nodes, what version of elasticsearch, a simple one-sentence idea of what your cluster does (logs, security, whatever), ingest pattern, average document sizes, ...

My understanding of what you wrote is you have a number of indices (how many is not given), sizes which average around 200GB per index, so some bigger and some smaller, and each index is one primary shard, and an unknown number of replica shards. And you have also tried to bulk ingest 25.1GB of data in one call, which has failed as its bigger than some elasticsearch limit. Thats an error.

If I've understood wrong, please correct me.

If you want to get past the error without changing anything, then break it up into smaller chunks, both now and on an ongoing basis. Personally, I think that would seem like a sensible thing to do anyways.

The limits can be seen with

curl -sk -u USER:PASSWORD https://ESHOST:9200/_nodes/stats/breaker

I think there is a way to increase the specific limit, but I'd rather know more about what you are doing before going there.

At

there's a section titled:

"Aim for shards of up to 200M documents, or with sizes between 10GB and 50GB"

for which the one line summary is "Very large shards can slow down search operations and prolong recovery times after failures".

1 Like

Thanks, i found some information about shard sizing and performance issue