Hello, I recently upgraded 7.4 and I'm having a problem where the sawtooth pattern's reclaimed heap keeps shrinking (EG: Initially GC goes from 75% to 20%... but the node goes to 30%... then 40% with each additional reclamation).
Is there a query that will show how much heap each index is using?
Not the standard monitoring tools. I'm using AWS's managed Elasticsearch but I'm preparing to move off of it over the next few days. Being unable to get a heap dump has really stymied my debugging.
Ahh yeah, they do have somewhat limited tools in that area unfortunately.
You can try out elastic.co/cloud as it includes some good things there. It'll show things like query and indexing rates, resource usage and more. You won't get a heapdump though, as it's *aaS, so no host access.
Generally 5 nodes (but I've tried 3 as well), 10 shards per index, 18 indexes.
I've tried from 5x8GB nodes (assume heap is 40-50%) to 5x32GB nodes. The larger nodes are lasting longer simply due to Heap size I suspect. However, I have to keep triggering Amazon rollovers every day or so before the cluster crashes (Even with twice the machine power as before the upgrade).
The strange detail is that the total index size is not large (~25 GB) and generally CPU usage remains fairly low (Max around 35%). It's a real mystery.
The queries are all over the map at around 2,500 per minute. A lot of aggregations and filter operations (There are 471 different elasticsearch calls). Indexing averages a couple hundred per minute.
The documents are generally fairly small (although there are millions of them) and have a couple dozen fields each on average (With the exception of one index that has almost a hundred).
I've looked at copious amounts of memory output from various elasticsearch stats endpoints and everything looks fine besides the heap used number. Cached queries is only in the megabytes. Fielddata is almost 0.
It would be nice if elasticsearch had a "data associated with this index uses 5 GB of heap" endpoint but I haven't found one. I am hopeful that there is one index that is causing trouble that will help isolate the issue.
I've been running my own cluster the entire day side by side with the Amazon cluster forwarding traffic to both. My cluster is running perfectly and the Amazon cluster can't stay up for more than 5 hours. My cluster also literally costs more than 75% less.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.