Hi everyone,
I would like to clarify a memory usage scenario in our Elasticsearch cluster because internally there is some debate about whether this represents a real memory pressure issue or just normal Linux filesystem cache behavior.
Environment:
-
Elasticsearch 9.3.1
-
3-node cluster
-
Ubuntu 22.04
-
Each node has 32 GB RAM
From Dev Tools, we see the following:
GET _cat/nodes?v&h=name,heap.percent,ram.percent,cpu,load_1m,node.role,master
name heap.percent ram.percent cpu load_1m node.role master
elastic-2 33 98 64 2.84 cdfhilmrstw *
elastic-3 79 98 41 3.46 cdfhilmrstw -
elastic-1 28 98 82 2.20 cdfhilmrstw -
At the OS level (free -h) we still observe significant available memory and no swap usage:
free -h
total used free shared buff/cache available
31Gi 19Gi 849Mi 1.0Mi 11Gi 11Gi
Swap: 0B 0B 0B
The concern internally is that ram.percent close to 100% may indicate Elasticsearch memory saturation.
However, what seems interesting to me is:
-
all nodes report
ram.percent=98 -
but heap usage is very different between nodes (
28%,33%,79%) -
swap is unused
-
Linux still reports ~11 GiB available memory
My current understanding is:
-
ram.percentincludes Linux filesystem/page cache usage -
Linux intentionally uses free RAM for cache
-
high
ram.percentalone may not necessarily indicate harmful memory pressure -
heap.percent, swap activity, GC behavior, OOM events, and available memory may be more relevant indicators for Elasticsearch health
Questions:
-
Is it correct to interpret high
ram.percentby itself as normal behavior on Linux-based Elasticsearch nodes? -
In practice, which metrics do you consider most reliable to determine real memory pressure affecting Elasticsearch performance?
-
Would you consider this scenario healthy if swap is unused and available memory remains high?
Thanks in advance.