Memory leak when creating snapshot with frequent repository change

Hi,

When using the snapshot feature and creating many repositories, we can see that files are mapped in memory but never removed. For example, adding a repository, doing a snapshot and removing it multiple times resulted in the same file segment being mapped multiple times in memory and never removed. This leads to huge Pagetables memory usege over time. Below is a result of doing 3 cycles of creating repo/doing snapshot/removing repo.

elasticsearch@es-archives-es-master-data-0:~/data/indices$ grep VexdZNC0TzGCU0CxQua2ug/0/index/_2n.cfs /proc/95/maps
7cc3f2400000-7cc3fcd0f000 r--s 00000000 08:a0 5243782                    /usr/share/elasticsearch/data/indices/VexdZNC0TzGCU0CxQua2ug/0/index/_2n.cfs
7ccba0000000-7ccbaa90f000 r--s 00000000 08:a0 5243782                    /usr/share/elasticsearch/data/indices/VexdZNC0TzGCU0CxQua2ug/0/index/_2n.cfs
7cd362000000-7cd36c90f000 r--s 00000000 08:a0 5243782                    /usr/share/elasticsearch/data/indices/VexdZNC0TzGCU0CxQua2ug/0/index/_2n.cfs
7cd8a1800000-7cd8ac10f000 r--s 00000000 08:a0 5243782                    /usr/share/elasticsearch/data/indices/VexdZNC0TzGCU0CxQua2ug/0/index/_2n.cfs
elasticsearch@es-archives-es-master-data-0:~/data/indices$ grep VexdZNC0TzGCU0CxQua2ug/0/index/_2n.cfs /proc/95/maps
7cbc41600000-7cbc4bf0f000 r--s 00000000 08:a0 5243782                    /usr/share/elasticsearch/data/indices/VexdZNC0TzGCU0CxQua2ug/0/index/_2n.cfs
7cc3f2400000-7cc3fcd0f000 r--s 00000000 08:a0 5243782                    /usr/share/elasticsearch/data/indices/VexdZNC0TzGCU0CxQua2ug/0/index/_2n.cfs
7ccba0000000-7ccbaa90f000 r--s 00000000 08:a0 5243782                    /usr/share/elasticsearch/data/indices/VexdZNC0TzGCU0CxQua2ug/0/index/_2n.cfs
7cd362000000-7cd36c90f000 r--s 00000000 08:a0 5243782                    /usr/share/elasticsearch/data/indices/VexdZNC0TzGCU0CxQua2ug/0/index/_2n.cfs
7cd8a1800000-7cd8ac10f000 r--s 00000000 08:a0 5243782                    /usr/share/elasticsearch/data/indices/VexdZNC0TzGCU0CxQua2ug/0/index/_2n.cfs
elasticsearch@es-archives-es-master-data-0:~/data/indices$ grep VexdZNC0TzGCU0CxQua2ug/0/index/_2n.cfs /proc/95/maps
7cb490000000-7cb49a90f000 r--s 00000000 08:a0 5243782                    /usr/share/elasticsearch/data/indices/VexdZNC0TzGCU0CxQua2ug/0/index/_2n.cfs
7cbc41600000-7cbc4bf0f000 r--s 00000000 08:a0 5243782                    /usr/share/elasticsearch/data/indices/VexdZNC0TzGCU0CxQua2ug/0/index/_2n.cfs
7cc3f2400000-7cc3fcd0f000 r--s 00000000 08:a0 5243782                    /usr/share/elasticsearch/data/indices/VexdZNC0TzGCU0CxQua2ug/0/index/_2n.cfs
7ccba0000000-7ccbaa90f000 r--s 00000000 08:a0 5243782                    /usr/share/elasticsearch/data/indices/VexdZNC0TzGCU0CxQua2ug/0/index/_2n.cfs
7cd362000000-7cd36c90f000 r--s 00000000 08:a0 5243782                    /usr/share/elasticsearch/data/indices/VexdZNC0TzGCU0CxQua2ug/0/index/_2n.cfs
7cd8a1800000-7cd8ac10f000 r--s 00000000 08:a0 5243782                    /usr/share/elasticsearch/data/indices/VexdZNC0TzGCU0CxQua2ug/0/index/_2n.cfs

I’v created a ticket for it here with detailed memory breakdowns Page Tables Memory Leak During S3 Snapshots · Issue #131855 · elastic/elasticsearch · GitHub , but it was closed. Imo this is not a expected behavior as it leads to oom. This is only happening in 8 and 9 versions, 7 is not affected.

This is quite different, and much more useful, information than what you shared in the Github issue.

However, I still think this is expected: these mmapped files should be unmapped by a GC in due course, so this is not in itself evidence of a memory leak.

Hi David,

Looking at memory graphs, they were never unmapped in a 3 month span, leading to OOM kill. Removing the repo does not unmap them.

regards

My understanding is somewhat out of date - these days they should be unmapped immediately on close. However, there appears to be an ongoing discussion about a bug in Lucene that I believe matches these symptoms.

Hi David,

Should I open a new ticket on this? Or can we have the old one reopened?

regards

No action to take here on the Elasticsearch side, this is something that needs fixing in Lucene and the issue for that is already open.