ElasticSearch throwing “OutOfMemoryError[unable to create new native thread]” error

Hello,

We have setup a cluster on the 2 BFM servers, I have only the basic cluster running with 1 data node, 1 master node and 1 client node. The master nodes will act as master + data and are configured to have a reserved JAVA heap size of each node is having 16 GB. Our server are highly configured along with 256 GB RAM. Pretty frequently, I notice that the data only node will fail due to out of memory error while searching. But when it fails, I see the error:

[2015-10-05 04:43:11,731][INFO ][http ] [DEV_DATA] bound_address {inet[/0:0:0:0:0:0:0:0:9240]}, publish_address {inet[/localhost:9240]}
[2015-10-05 04:43:11,734][INFO ][node ] [DEV_DATA] started
[2015-10-05 04:44:26,566][ERROR][marvel.agent.exporter ] [DEV_DATA] create failure (index:[.marvel-2015.10.05] type: [node_stats]): RemoteTransportException[[DEV_MASTER][inet[/localhost:9300]][indices:data/write/bulk[s]]]; nested: OutOfMemoryError[unable to create new native thread];
[2015-10-05 04:44:58,179][WARN ][indices.cluster ] [DEV_DATA] [[.marvel-2015.10.05][0]] marking and sending shard failed due to [failed recovery]
org.elasticsearch.indices.recovery.RecoveryFailedException: [.marvel-2015.10.05][0]: Recovery failed from [DEV_MASTER][F_H_SMI5T4imEDleZ4FZxg][dayrhebfmd001.enterprisenet.org][inet[/localhost:9300]]{max_local_storage_nodes=1, master=true} into [DEV_DATA][muhubm9FSsevgdnyQJTb0Q][dayrhebfmd001.enterprisenet.org][inet[dayrhebfmd001.enterprisenet.org/localhost:9260]]{max_local_storage_nodes=1, master=false}
at org.elasticsearch.indices.recovery.RecoveryTarget.doRecovery(RecoveryTarget.java:280)
at org.elasticsearch.indices.recovery.RecoveryTarget.access$700(RecoveryTarget.java:70)
at org.elasticsearch.indices.recovery.RecoveryTarget$RecoveryRunner.doRun(RecoveryTarget.java:567)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:36)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.transport.RemoteTransportException: [DEV_MASTER][inet[/localhost:9300]][internal:index/shard/recovery/start_recovery]
Caused by: java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:714)
at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:950)
at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1368)

See the below screenshot of log details of data node and please guide us to resolve this issue as soon as possible.

Thanks,
Ganeshbabu R

When ES runs out of heap memory the problem is typically that you have reached the capacity of the cluster (or a single node), i.e. you have too much data or post too many queries for the hardware you have.

Start by bumping the Elasticsearch's JVM heap. If you have 256 GB RAM you shouldn't have any issues increasing it from 16 to 31 GB (the limit before pointers become uncompressed).

General methods to decrease heap usage include enabling doc values (default as of ES 2.0) and configuring the field data cache to actually start evicting fields once the cache is full.

2 Likes

Generally, I agree to what @magnusbaeck comment. Just my 2 cents from managing es production cluster. Sometime increase heap is good but for commodity server, but if too much heap to the jvm will be slow too given gc activities. But this is also subject your cluster use cases.

Another point is the recent es exposes a lot of interesting metrics to monitor heap usage and you want to read on that and measure node jvm heap.

Thanks for your responses @magnusbaeck @Jason_Wee

I made some changes in yml file
indices.fielddata.cache.size: 75%
indices.breaker.fielddata.limit: 85%

I would like to know is this right way to enable doc values for the field mappings?

Example for Sample mappings:-

Will doc values be enabled for the field type "string"?

"load_item": {
"mappings": {
"item": {
"properties": {
"load_NAN": {
"type": "string",
"fields": {
"load_NAN_CONTAINS": {
"type": "string",
"index_analyzer": "str_index_analyzer"
“doc_values:”true”
},
"load_NAN_RAW": {
"type": "string",
"index": "not_analyzed"
“doc_values:”true”
},
"load_NAN_STARTS": {
"type": "string",
"index_analyzer": "prefix-analyzer",
"search_analyzer": "keyword"
“doc_values”:”true”
}
}
},
"load_STRT_DT": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss"
“doc_values:”true”
}
}
}
}}}

Please let us know if it's correct or not..

Regards,
Ganesh

Doc values won't work for analyzed strings. Otherwise it looks okay.

1 Like

Thanks for your quick response..

Is there any property for field data to disable during bulk load indexing? and same property can be enable during searching time?

I am asking this questions to avoid out of memory error during searching time...

You can give any other suggestions..

Regards,
Ganesh

Your heap is OK. The exception means that your OS can not create new native threads. The number of threads in a JVM is limited by the OS process stack, not the JVM heap, so the exception message might be a bit misleading.

Anyway, it takes many many thousands of threads to create such an exception. This vast number is not manageable by a JVM or an operating system and will bring down a process to a halt.

So you should inspect your cluster configuration if you have thread pools configured that are unlimited. This is a very bad idea and therefore not configured by default. Unlimited thread pools are a classic method to bring a cluster down. Set up reasonable limits for thread pools.

1 Like

Thank for your response @jprante

I checked the thread pool limit for node wise..

I figured thread pool settings with using the curl commands

curl -XGET " localhost:9200/_nodes/thread_pool?pretty"

Questions:-

  1. Is this unlimited threadpools setttings you are talking about ?

  2. How I change the settings of thread pool size?

To determine the size of thread pool you need to know # of processors I am having 48 cores of cpu

3, what is ideal value for setting # of processors?

4 should I add these settings in .yml file or using curl command?

Please let us know..

Samples threadpool settings

"cluster_name" : "ES_DEV",
"nodes" : {
"ycvddsfsdAD221cww" : {
"name" : "Server_DEV_DATA",
"transport_address" : "inet[/localhost:9200]",
"host" : "localhost.enterprisenet.org",
"ip" : "localhost",
"version" : "1.7.2",
"build" : "e43676b",
"http_address" : "inet[/localhost:9200]",
"attributes" : {
"max_local_storage_nodes" : "1",
"master" : "false"
},
"thread_pool" : {
"index" : {
"type" : "fixed",
"min" : 32,
"max" : 32,
"queue_size" : "200"
},
"search" : {
"type" : "fixed",
"min" : 49,
"max" : 49,
"queue_size" : "1k"
},
"bulk" : {
"type" : "fixed",
"min" : 32,
"max" : 32,
"queue_size" : "50"
}
}
},
"y8ilzvjfR0abu0L8Yg" : {
"name" : "DEV_CLIENT",
"transport_address" : "inet[/localhost:9200]",
"host" : " localhost.enterprisenet.org",
"ip" : " localhost",
"version" : "1.7.2",
"http_address" : "inet[/localhost:9200]",
"attributes" : {
"max_local_storage_nodes" : "1",
"data" : "false",
"master" : "false"
},
"thread_pool" : {
"index" : {
"type" : "fixed",
"min" : 32,
"max" : 32,
"queue_size" : "200"
},
"search" : {
"type" : "fixed",
"min" : 49,
"max" : 49,
"queue_size" : "1k"
},
"bulk" : {
"type" : "fixed",
"min" : 32,
"max" : 32,
"queue_size" : "50"
}
}
}
},
"KxAv6KaEfBaLcsz63g" : {
"name" : "DEV_MASTER",
"transport_address" : "inet[localhost.enterprisenet.org/ localhost:9200]",
"host" : " localhost:9200.enterprisenet.org",
"ip" : " localhost",
"version" : "1.7.2",
"http_address" : "inet[/localhost:9200]",
"attributes" : {
"max_local_storage_nodes" : "1",
"master" : "true"
},
"thread_pool" : {
"index" : {
"type" : "fixed",
"min" : 32,
"max" : 32,
"queue_size" : "200"
},
"search" : {
"type" : "fixed",
"min" : 49,
"max" : 49,
"queue_size" : "1k"
},
"bulk" : {
"type" : "fixed",
"min" : 32,
"max" : 32,
"queue_size" : "50"
}
}
}
}
}

Regards,
Ganeshbabu R

Hi Magnus,

For my understanding I want some clarification regarding analyzed strings

Examples of analyzed strings
The fields type and having analyzer

"fields": {
"load_NAN_CONTAINS": {
"type": "string",
"index_analyzer": "str_index_analyzer"
“doc_values:”true”
},

Examples of not_analyzed strings
"load_NAN_RAW": {
"type": "string",
"index": "not_analyzed"
“doc_values:”true”
},

Examples of type strings
"IMAGE_IND": {
"type": "string",
“doc_values:”true”
},
"HIST_IND": {
"type": "string",
“doc_values:”true”
},

Please let me know its correct or not..

Thanks,
Ganeshbabu R

I think index defaults to analyzed, in which case IMAGE_IND and HIST_IND are analyzed and can't use doc values.

Thanks for your response.