Elasticsearch High CPU Usage - GC Not Working

tokh · August 2, 2016, 2:28am

Hello Michael, we are facing similar issues. Were you able to solve this problem in the meantime?
I looked around a lot and coudn't find anything conclusive. The most promising is a case suggesting
a kernel bug (linux). See Github-Issue

What OS are you using?
We are using old linux machines, kernel 3.11.10. I'm thinking of trying that next (if I get the permission to do so)...

Michael_Sander · September 24, 2016, 7:21am

@tokh I added more RAM, which helps, but in general, still no solution.

Michael_Sander · November 4, 2016, 5:50am

Good news, I fixed the issue for good! I want to share what I found. I am in NYC and a few weeks ago elasticon came and brought a number of support engineers offering free help. I came prepared and brought a 30GB Java memory dump that was taken at the time of the problem. I've spent hours staring at that memory dump, but could not make sense of it. The ES engineer took about 30 seconds to find the problem.

The problem had to do with paging and Googlebot. We have millions of documents in our database, and accordingly, have millions of pages of results. Most of our users rarely look past page 3 or 4, but Googlebot would routinely traverse hundreds of thousands of pages deep. I don't truly understand the underlying issue, but apparently, that was the source of the problem. Apparently, when you page to, for example, page 100,000, some information is kept in memory about pages 1 to 999,999. This can suck up your memory fast.

The solution was to not let the user page through more than a few hundred or a few thousand pages (I believe we kept it to a maximum of 3000 pages). The problem went away immediately.

If anyone could offer any additional insight as to why this occurs, I would appreciate it. I did not see anything in the documentation warning about this issue. Also, a big shout-out to the ES engineer who found the issue, I was impressed.

tokh · November 21, 2016, 2:57am

Congratulations. Just an update on our side: We are still facing the issue. We did upgrade/change the OS to a kernel 4.4.0-45-generic, but without success. So, my previouly added link did not help.
Our issue might be related to us using heavy aggregations. So, I wonder perhaps if there is a way to force garbage collection once in while.

Alexander_Trumper · November 26, 2016, 9:34am

Hi, I guess it's about the deep pagination issue: https://www.elastic.co/guide/en/elasticsearch/guide/current/pagination.html

tokh · December 1, 2016, 7:21am

Thanks for the advice. However, in our case that wasn't the issue. We finally figured out our issue: We had some bad groovy-query-scripts, that caused memory-leaks. Refer to this github-discussion.

Topic		Replies	Views
Elasticsearch fills the heap then spends all its time doing garbage collection Elasticsearch	5	5356	July 6, 2017
Help me debug CPU use issues Elasticsearch	15	1124	July 6, 2017
Very long GC Elasticsearch	11	6877	July 6, 2017
ElasticSearch 2.3.4 grinding to a halt Elasticsearch	10	1342	July 5, 2017
Garbage collector logs long passes Elasticsearch	6	452	July 6, 2017

Elasticsearch High CPU Usage - GC Not Working

Related topics