I am aware of the general recommendation of the 50% rule where xms,xmx should be 50% of the available RAM. Elasticsearch will use the non-heap 50% to store page cases which is managed at kernel level.
However, I am reading that in the case of containers page cache is at host level as it is maintained by kernel. So, does that mean containers can use memory in the form of page cache beyond their "limits"? Does that mean 50% rule is not that meaningful when operating ES as a container?
So, it seems even though the mmap mapping is created per container.. they map to same underlying page cache
In the context of elasticsearch, there would never be a need for sharing page caches.. but what I want to understand is the impact on the 50% rules if the page cache falls outside the realm of containers (and the container memory limits)ˀ
I think we are talking about different things: do you need to allocate the memory in the container vs can a memory mapped file be shared
Small experiment with a container that has 512M heap. Elasticsearch wouldn't even start for me with 850M for the entire container. Even with no data, the memory use on this container is on the very high side. docker stats <id> puts me to around 93% memory usage; BTW setting swap to the same as memory disables swap, since this would only make it more confusing:
Run the container: docker run --memory=950M --memory-swap=950M -e ES_JAVA_OPTS="-Xms512m -Xmx512m" --publish 9200:9200 -it docker.elastic.co/elasticsearch/elasticsearch:8.5.2
Then you can check the memory usage with curl -k -u elastic "https://localhost:9200/_nodes/stats/jvm?human" | jq. Relevant part of the output:
I am referring to the page cache (aka file system cache) and not swap memory. Upon further reading, this is what I think is the conclusion...
50% rule still applies for Elasticsearch.
Page cache is counted against the container even though it is the kernel that manages it.
But, page cache is also shared across containers if docker overlay2 storage driver is being used which is the default I think now? But the gotcha is that the page cache would be accounted in equal proportions if multiple containers are accessing same files. Since sharing of files is not a scenario for Elasticsearch, we don't need to worry about its impact on memory calculations.
Accounting for memory in the page cache is very complex. If two processes in different control groups both read the same file (ultimately relying on the same blocks on disk), the corresponding memory charge is split between the control groups. It’s nice, but it also means that when a cgroup is terminated, it could increase the memory usage of another cgroup, because they are not splitting the cost anymore for those memory pages.
I still lack some clarity on how these memories are reported, but I am convinced that 50% rule applies
A data directory can only be used by a single Elasticsearch instance. Try using the same bind-mount in two containers and you'll see the second one fail because the data directory is already locked.
So, in the context of Elasticsearch, sharing an mmap'ed file isn't really a thing. And why the 50% rule (it's an approximation, it could even make sense to have less heap than that in some situations) will apply just like on any other installation method.
Don't get too sidetracked what mmap can theoretically do if it doesn't make sense in the context of Elasticsearch.