Elasticsearch coordinating node OOM/crash under sustained ingest (60k docs/min) with high shard count (~800)

Osmel_Pillot_Leyva · June 4, 2026, 3:51am

I have an Elasticsearch cluster with 3 master nodes, 1 data node, 1 ingest node, and 1 coordinating node. I also have 1 Kibana instance and 1 Fleet Server. Data ingestion is performed through approximately 12 Elastic Agents managed by Fleet. In the Fleet Server settings in Kibana, the server is pointing to the Fleet Server, and the output is configured to send data to the coordinating node.

Currently, the cluster has around 800 shards, although disk usage is low (about 50 GB on the data node), so this does not appear to be a storage capacity issue. The average ingestion volume is around 60,000 documents per minute across all agents.

The issue is that after 2–3 days of continuous operation, the coordinating node crashes (typically associated with memory pressure), causing instability in the cluster. I am trying to determine whether the root cause is related to the ingestion architecture (all traffic going through the coordinating node), the high number of shards, or a combination of both factor

s.

DavidTurner · June 5, 2026, 8:57am

Unfortunately there's not really a way to answer this from the information provided. We have clusters running for months under much heavier load without seeing any problems like this. You will need to look at the heap dump to work out what consumed all the memory.

This setup is definitely not resilient and seems overly complex for your needs. You would be better-served with 3 nodes that just do everything.

Osmel_Pillot_Leyva · June 8, 2026, 4:13am

Hi,

Thanks for your previous response.

I’ve been reviewing my cluster configuration in more detail, and I noticed something that might be relevant. Currently, my cluster has around 800 shards in total, while the actual data volume is relatively small (around 50 GB on the data node). This results in very small shard sizes.

From what I’ve been reading, this seems far from recommended shard sizing guidelines, and I’m starting to suspect that the high shard count could be putting additional pressure on heap memory—especially on the coordinating node, which is handling all ingestion traffic.

Would it be reasonable to consider the number of shards as a primary root cause of the memory issues I’m experiencing on the coordinating node?

Also, as context: I’m relatively new to Elasticsearch, and this is actually my first real-world cluster deployment after graduating, so I’m still building a solid understanding of best practices. I’d really appreciate any guidance on whether I should prioritize reducing shard count versus redesigning the node roles.

Thanks again for your time.

RainTown · June 9, 2026, 10:14am

My suggestion here is simplify. You note you are relatively new. So start simple.

Simplest is one node that does everything

Next up is a 3-node cluster, where every node does everything. As per @DavidTurner

As to number of shards, why do you have 800 shards? How many indices? How many indices are being actively written to?

In small cluster just 1 shard and a replica are often enough.

DavidTurner · June 9, 2026, 1:32pm

It's not ideal and worth fixing but this wouldn't explain the symptoms you described in the OP. If it was going to fail because of this, it'd do so immediately.

Topic		Replies	Views
Elasticsearch coordinating node OOM/crash under sustained ingest (60k docs/min) with high shard count (~800) Elasticsearch	1	38	June 4, 2026
ElasicSearch cluster are crashing when conducting heavy aggregations Elasticsearch	2	470	July 9, 2019
Elasticsearch 6.5.4 - Kibana Coord Nodes OutOfMemory Errors And General Cluster Improvement Questions Elasticsearch	4	791	July 19, 2019
Search impacts on ingest causing gap in data Elasticsearch	0	510	January 11, 2018
One of the cluster nodes get crashing down when cluster was allocating shards Elasticsearch	1	390	June 5, 2018

Elasticsearch coordinating node OOM/crash under sustained ingest (60k docs/min) with high shard count (~800)

Related topics