Out of Memory Error

Doug_Swanson · October 20, 2015, 10:36am

We got an OOM error earlier this week and it looks like some resource other then memory may be the problem. These boxes have 64GB, 30GB allocated to java heap. At the time of the error they were all using around 30%-35% of the jvm's memory.

All nodes reported 65k limit on file descriptors before and after the error and cluster restart.

We are using the jdbc river plugin (yes for now anyway), which seems to be the event that triggers it. ( jdbc-1.5.0.5-da4ba96 1.5.0.5 )

Any clues, hints, suggestions are appreciated.

Thanks
-Doug

[2015-10-19 05:10:07,722][WARN ][index.engine             ] [es4] [sales][1] failed engine [out of memory (source: [maybe_merge])]
java.lang.OutOfMemoryError: unable to create new native thread
	at java.lang.Thread.start0(Native Method)
	at java.lang.Thread.start(Thread.java:714)
	at org.apache.lucene.index.ConcurrentMergeScheduler.merge(ConcurrentMergeScheduler.java:391)
	at org.elasticsearch.index.merge.EnableMergeScheduler.merge(EnableMergeScheduler.java:50)
	at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1985)
	at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1979)
	at org.elasticsearch.index.engine.InternalEngine.maybeMerge(InternalEngine.java:778)
	at org.elasticsearch.index.shard.IndexShard$EngineMerger$1.run(IndexShard.java:1241)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

GET /_nodes/process:

{
   "cluster_name": "es_prod",
   "nodes": {
      "uvieBMr3SXKu7BvKuWEJmQ": {
         "name": "es4",
         "version": "1.7.0",
         "build": "929b973",
         "http_address": "inet[/12.130.11.49:9200]",
         "process": {
            "refresh_interval_in_millis": 1000,
            "id": 13702,
            "max_file_descriptors": 65535,
            "mlockall": true
         }
      },
      "xnOltXOsS5eY7VAotvYiSg": {
         "name": "es2",
         "version": "1.7.0",
         "build": "929b973",
         "http_address": "inet[/12.130.11.47:9200]",
         "process": {
            "refresh_interval_in_millis": 1000,
            "id": 43023,
            "max_file_descriptors": 65535,
            "mlockall": true
         }
      },
      "imfqR95jSKOhErEN8nQk3w": {
         "name": "es3",
         "version": "1.7.0",
         "build": "929b973",
         "http_address": "inet[/12.130.11.48:9200]",
         "process": {
            "refresh_interval_in_millis": 1000,
            "id": 26569,
            "max_file_descriptors": 65535,
            "mlockall": true
         }
      }
   }
}

Michael_Salmon · October 20, 2015, 12:46pm

The jvm reports out of memory when it exceeds the number of processes allowed for that user. The number of processes or threads is counted per user. You need to increase the value before starting elasticsearch.

/Michael

Doug_Swanson · October 20, 2015, 1:01pm

Is the value you're referring to from limits.conf?
ulimit reports: max user processes (-u) 514271

Michael_Salmon · October 20, 2015, 1:08pm

That's correct, the value is set per user though so you have to check it for the user that runs es.

/Michael

jprante · October 20, 2015, 1:29pm

You could also try to launch a separate node (with node.data:false and node.master:false) for the JDBC river plugin, to move the workload of JDBC processing from the ES indexing.

Also, you could tweak the JDBC bulk data import, slow it down a bit, so the cluster is not overwhelmed.

Another method is to streamline segment merging, but this is an advanced topic. Instead, you could add nodes until segment merge errors are gone, this is easy.

Doug_Swanson · October 20, 2015, 1:32pm

sorry I should have added that part...I did check that, there are no user specific settings on the process count in limits.conf so the value I was getting is the default and should apply to all users, right?

Michael_Salmon · October 20, 2015, 2:09pm

It's hard to believe that ordinary users can spawn 514271 processes and threads. On the systems I use (RHEL & CentOS) the default is 1024.

/Michael

Doug_Swanson · October 20, 2015, 2:28pm

I agree the number is crazy large, but its NOT defined in limits.conf and that is what ulimit reports (centos 6.5) so would seem that limit is not the underlying problem.

Doug_Swanson · October 20, 2015, 10:49pm

Jörg-

How would I go about slowing down the river? Seems like the low hanging fruit in the choices...

Thanks
-Doug

Michael_Salmon · October 21, 2015, 5:29am

Look in /etc/security/limits.d/90-nproc.conf:

*          soft    nproc     1024
root       soft    nproc     unlimited

This is a part of the pam rpm.

/Michael

Doug_Swanson · December 14, 2015, 8:13pm

Yep, there it is thanks! Didn't realize that ulimits were being set by two different configs.

I was able to verify the process picked up the new limits. Appreciate the help!

Thanks
-Doug

Topic		Replies	Views
java.lang.OutOfMemoryError: unable to create native thread Elastic Search elastic-app-search	1	475	March 30, 2020
Java.lang.OutOfMemoryError after trying to garbage collect for 20 minutes Elasticsearch	1	481	July 5, 2017
Getting out of memory Elasticsearch	2	292	July 6, 2017
OutOfMemoryError ELK 5.0.2 Elasticsearch	11	2930	January 18, 2017
Engine failure, message [OutOfMemoryError[unable to create new native thread]] Elasticsearch	3	417	July 6, 2017

Out of Memory Error

Related topics