Yeah for now I will not change heap size. Only set Dorg.apache.lucene.store.MMapDirectory.sharedArenaMaxPermits=1.
Really thank you so much. You helped me a lot)
Yeah for now I will not change heap size. Only set Dorg.apache.lucene.store.MMapDirectory.sharedArenaMaxPermits=1.
Really thank you so much. You helped me a lot)
I monitored 1 data node. When the Elasticsearch process reaches about 86% RAM usage in top (RSS ≈ 13.5 GB), the Native Memory Tracking report shows:
Total: reserved = 10402009 KB, committed = 9058861 KB
The maximum number of mappings at that moment is around 84,833.
So from NMT I see roughly 10 GB reserved (about 9 GB committed), when the process RSS even reaches to 14.2 GB.
Is this difference between RSS (13.5 GB) and NMT “Total reserved” (~10 GB) expected, or does it indicate a possible native memory leak?
My questions are:
Is this difference between RSS (13.5 GB) and NMT “Total reserved” (~10 GB) expected, or does it indicate a possible native memory leak?
Does the “Total” in the Native Memory Tracking report include memory used for Lucene’s mmapped index files and the OS page cache, or are those outside what NMT accounts for?
thanks for sticking with us …
There’s the heap and direct memory too, so I’d consider this “expected”. RSS its total resident set ( aka RAM used) by the JVM process, covering all the various “types” of memory.
Yes to the Lucene’s mmap-ed files. Operating system caches are not memory allocated to the JVM.
This is broadly what you shared before. And it is not growing?
Compare both RSS and that value with any other data nodes.
But also understand that the process’ RSS increasing is not in itself an indication of a leak.
EDIT: Consider to attach jconsole to all your data nodes to help monitor. An implicit assumption I’ve made here is all your data nodes are sort of equal in terms of spec, shards, query/ingest load, etc. Is that the case? Can you share output of a GET on (use DevTools)
_cat/nodes?v&h=name,ip,role,version,master,u,cpu,rc,rm,rp,hc,hm,hp,load_1m,load_5m,load_15m&bytes=b
name role version master u cpu rc rm rp hc hm hp load_1m load_5m load_15m
prod-elastic-data03 di 9.0.0 - 3.9d 12 16297820160 16728715264 97 6042545512 8589934592 70 1.21 0.84 0.84
prod-elastic-data08 di 9.0.0 - 5.4d 37 16398499840 16728768512 98 5905580032 8589934592 68 1.84 1.95 2.01
prod-elastic-data01 di 9.0.0 - 5.4d 21 15351324672 16875683840 91 4529254496 8589934592 52 2.89 2.47 2.37
prod-elastic-data07 di 9.0.0 - 5.5d 17 16552493056 16875651072 98 4536139776 8589934592 52 1.71 1.69 1.56
prod-elastic-data02 di 9.0.0 - 4.4d 22 16544157696 16875622400 98 6079643648 8589934592 70 2.12 2.31 2.06
prod-elastic-data05 di 9.0.0 - 3.7d 29 16554582016 16875663360 98 3088269760 8589934592 35 2.33 2.19 2.30
prod-elastic-data06 di 9.0.0 - 5.6d 33 16544481280 16875675648 98 3870582392 8589934592 45 0.96 1.00 1.20
prod-elastic-data04 di 9.0.0 - 3.8d 9 16528756736 16875642880 98 6221453072 8589934592 72 0.25 0.25 0.30
Thank you for explanation! I can see the similar results for the data nodes.
Best not to paste pictures of text please, just paste the text itself.
Any more crashes?
I see you changed the ip to io, thats fine, we don’t really care about your IP addresses.
tbh your output looks pretty healthy to me, except none of your data nodes have been up for long, I guess as a result of restarting them with different JVM options. Across your data nodes the heap varies a bit, but I’ve no idea where in any GC cycle each node is (console is good to watch this). Cluster is not significantly loaded. The fact that you are on 9.0.0 stands out a bit, that’d be fine if it was released last week, but looks like you upgraded at least 72 days ago. Was 9.0.0 the latest release when you did the upgrade? (unlikely as 9.0.1 followed fairly soon, and other point releases since then too)
As of right now I’m thinking it’s actually now unlikely that you have hit the specific bug we’ve discussed, you seem to get a different error and your counts seem well short of the limit, and you say wc -l /proc/$pid/maps is fairly consistent across all the data nodes?
Outside of doing an upgrade, I’m not sure what to suggest now to nail down the issue.
Others are welcome to chime in.
Hi @Aysel_Guliyeva Welcome to the community... I see you are getting lots of help.
I was a bit surprise at the response count so I thought I would drop in.
I see you are getting lots of detailed help...
AND so I will go back to the beginning
If this is an important elasticsearch cluster (i.e the data nodes) it is an anti-pattern to run other applications on the same VMs... the fact you are trying to calculate the memory by apps like Grafana etc... add them up and make them all fit in ... is a bad plan .. period.
This is a classic pattern of memory competition... there are lots of reasons why application memory can spike and as soon as elaticsearch can not get the memory it requires... poof!
You are running out of memory.. you are most likely colliding with some other app that is claiming memory. Running TOP here and there is not going to give you enough insight..
On top of this, if the approach in the environment is high utilization (which is a completely valid approach) if the underlying Virtualization is NOT pinning CPU and Memory Elastic can and will be unstable... i.e. is the underlying VM "Thin Provisioned."
I used to give the talk at ElasticOns on this very topic... I have seen it over and over again.
All these other settings are a valid discussion, but my perspective is that they are mostly likely not the root cause nor fix of the basic underlying issue.
Elastic node with Dedicated Host or VMs (with dedicated resources) = Best / Stable Outcome
Kevin, thank you for answer. Yes I changed jvm options for data nodes, so I restarted it. I think 5 months ago, Elasticsearch version was upgraded to 9.0.0. Now none of data nodes have been crashed during our discussion.
Hello. Thank you!
Node exporter only exists In all Elasticsearch data nodes. Other apps such as Grafana, Prometheus are located on separate VMs.
RAM, CPU and disk are statically reserved for Elasticsearch data nodes.
Thank you for explanation
Thanks for sharing the various updates. To summarize where we are now (correct if I got anything wrong):
wc -l /proc/$pid/mapsA couple of questions:
any update @Aysel_Guliyeva ?
Hello, Kevin. Thank you for answer.
All of these points are true.
The answers to your questions:
Yes. But before we didn’t have a lot of latency rules for services in APM in Elasticsearch 8.13.4. Now latency rule for each service, rate limit rule and another custom ESQL rule for custom services are running on APM. But before they didn’t exist. So I can’t compare the current situation with Elasticsearch 9.0.0 with Elasticsearch 8.13.4
So far, the Elasticsearch service has been stable and has not crashed on any data nodes.
So, either the setting above fixed it, or we are still waiting for next crash.
To guess if it's the first, I'd say somewhere around 2 times the longest recent gap between data nodes crashes is a decent estimate. So, if it run for 10 days without crashing sometime in Nov say, and that's the best recent no-crash gap, then need wait til ca: 20 days now. Even then cannot be sure, but evidence would be mounting.
If latter, important to capture as much info as possible about the crash when it crashes. Run the jcmd command every minute or so, maybe with VM.native_memory detail and Thread.print options, watch the jconsole, ...
Also, there is no swap partition on the VMs, right? Nor on the hosts hosting the VMs?
( and I would welcome more input from @stephenb too )
BTW 9.0.0 ... you will probably want to get to 9.2.2+ at some point, already many improvements.
Funny you mention this... I spent a looong time with another user in that case the elastic process was dying / killed unexpectedly... turned out a new corp security scan / qualys I think that was not recognizing the process and killing it... that was painfull.
we were there already, was answered with:
Yep, tho likely qualsys. Stuff like this needs checking, because the "I" in "RACI" is very rarely respected.
@Aysel_Guliyeva - are we still waiting for that next crash? what does a GET on
return now?
© 2020. All Rights Reserved - Elasticsearch
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant logo are trademarks of the Apache Software Foundation in the United States and/or other countries.