I’m running an Elasticsearch cluster in a production environment and recently noticed consistently high memory usage across all data nodes. Even during periods of low query activity, heap usage remains elevated and occasionally triggers long GC pauses.
Environment details:
-
Elasticsearch version: 8.x
-
Cluster size: 3 data nodes, 1 master
-
Each data node: 16 GB RAM, heap set to 8 GB
-
Primary use-case: log ingestion and search (via Filebeat + Kibana)
Symptoms:
-
Heap usage rarely drops below ~75%
-
Occasional slow searches during peak ingestion
-
GC logs show frequent old-gen collections
What I’ve checked so far:
-
No unusually large aggregations running
-
Shard count appears reasonable
-
Fielddata cache not heavily used
-
Circuit breakers not being triggered
I’d appreciate any guidance on:
-
Common causes of sustained high heap usage
-
Recommended tuning steps or metrics I should inspect
-
Whether this could be related to segment merging or mapping design
I’m happy to share additional logs or stats if needed.