Elasticsearch 5.2.2 : Memory keeps on increasing steadily untill ES gets killed by System OOM Killer

Our Environment is: 3 node ES Cluster with 3 data nodes. We have upgraded ES from 2.3.3 to 5.2.2.

The data nodes are allocated 31gb heap ( as recommended by ES Community ). Node1 mostly serves the search requests while Node 2/ Node 3 are used for Bulk insertions.

We have seen constant surge in the Memory usage of ES Node 1 ( node mostly used for search queries ), it starts up with 32g res memory and then res memory goes to 40g...45g...50g.. 56g...60g...62g and KILLED! ( by kernel's OOM killer ). This happens over the duration of 24-30hrs. The only thing we can do at this point is restart ES and same repeats over.

I have already gone through: Out of memory (invoked oom-killer) but this doesn't apply here as we are running on a Physical server with centos 6.8 ( 2.6.32-642.6.2.el6 ) with 64GB RAM and 24 core processors.

From this article, https://www.elastic.co/guide/en/elasticsearch/guide/current/heap-sizing.html , we can make out that ( guess ) that the memory above 32g+ is related to Lucene caching. Can someone please throw more light on what's happening here?

I have gone through github issues which already says some memory leak issue was already fixed before es5.2.2.

It'd be great if someone can help understand this behaviour and possible solution to this.

PS: This was not happening with ES 2.3.3 on contrary.

Thanks.

We're still facing this. Anyone has any specific hints for debugging this?

Assuming your indices are spread out across all nodes in. the cluster, do try to distribute bulk indexing requests and queries evenly across all nodes in the cluster. It is probably also worth upgrading to the latest version as I believe there have been some memory related issues that have been fixed.

Can you point out a few issues / change logs? It'll help us evaluate and proceed further.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.