ES consume high cpu with threadlocal

Hi all,
I'm using es 2.4.2 with java8u60's G1GC. Recently we found the load often goes very high( 20~25 while our machine have 32 vocres). After profiling we find "ThreadLocalMap.expungeStaleEntry" is consuming almost 80%+ cpu. How this can be ? there is only 48 search thread, the search cannot cost so many time. Sorry I cannot upload my flame.svg, bellow is the snapshot.

whoaa.. flame graphs!

if you stop the search , did you get better cpu idle time?

You are using an old Elasticsearch version (two major versions behind), you are using an old java version and a not recommended GC. Everyone of those components has seen an huge amount of fixes in the last years, so it is super tedious to try to debug this.

For example newer Elasticsearch versions have that thread local removed since almost 1.5 years, see https://github.com/elastic/elasticsearch/pull/20778

1 Like

Thanks for your reply! So I think the best choice is to upgrade ES to 5.x ?
I 've update the jdk version to jdk8u161 on one of my cluster and wander what will happen next

BTW, which version will you suggest for product enviroment?

As the load is too high, I restart my es node one by one

The latest 6.x release is what I recommend. Also you cannot just go from 2.x to 6.x - this requires planning, as the data format is not compatible as well as queries and mapping configuration. So make sure test everything in preproduction systems before doing this on your live system.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.