Unexpected high memory usage in Elasticsearch cluster – looking for optimization advice

I’m running an Elasticsearch cluster in a production environment and recently noticed consistently high memory usage across all data nodes. Even during periods of low query activity, heap usage remains elevated and occasionally triggers long GC pauses.

Environment details:

  • Elasticsearch version: 8.x

  • Cluster size: 3 data nodes, 1 master

  • Each data node: 16 GB RAM, heap set to 8 GB

  • Primary use-case: log ingestion and search (via Filebeat + Kibana)

Symptoms:

  • Heap usage rarely drops below ~75%

  • Occasional slow searches during peak ingestion

  • GC logs show frequent old-gen collections

What I’ve checked so far:

  • No unusually large aggregations running

  • Shard count appears reasonable

  • Fielddata cache not heavily used

  • Circuit breakers not being triggered

I’d appreciate any guidance on:

  • Common causes of sustained high heap usage

  • Recommended tuning steps or metrics I should inspect

  • Whether this could be related to segment merging or mapping design

I’m happy to share additional logs or stats if needed.

Exactly which version are you using?

Having a single master node in a cluster is not recommended as you have limited resiliency and no high availability. You should always aim to have 3 master eligible nodes in any cluster larger than a single node.

How much disk space does the node have? What type of storage are you using?

What is the full output of the cluster stats API?

This often indicates that you either need to improve the efficiency of the data stored in Elasticsearch or increase RAM and heap.

This can be an indication of slow storage, which is why I asked about this earlier. Maybe run iostat -x on the nodes during heavy load to see exactly what is going on.

Do you have swapping enabled? If not, is the cluster deployed on VMs that may not have enough RAM and use swapping behind the scenes?

What is your definition of reasonable? The output of the API I linked to earlier will provide stats around this as well as an indication of the type of mappings used.

1 Like