Hello,
We have setup a cluster on the 2 BFM servers, I have only the basic cluster running with 1 data node, 1 master node and 1 client node. The master nodes will act as master + data and are configured to have a reserved JAVA heap size of each node is having 16 GB. Our server are highly configured along with 256 GB RAM. Pretty frequently, I notice that the data only node will fail due to out of memory error while searching. But when it fails, I see the error:
[2015-10-05 04:43:11,731][INFO ][http ] [DEV_DATA] bound_address {inet[/0:0:0:0:0:0:0:0:9240]}, publish_address {inet[/localhost:9240]}
[2015-10-05 04:43:11,734][INFO ][node ] [DEV_DATA] started
[2015-10-05 04:44:26,566][ERROR][marvel.agent.exporter ] [DEV_DATA] create failure (index:[.marvel-2015.10.05] type: [node_stats]): RemoteTransportException[[DEV_MASTER][inet[/localhost:9300]][indices:data/write/bulk[s]]]; nested: OutOfMemoryError[unable to create new native thread];
[2015-10-05 04:44:58,179][WARN ][indices.cluster ] [DEV_DATA] [[.marvel-2015.10.05][0]] marking and sending shard failed due to [failed recovery]
org.elasticsearch.indices.recovery.RecoveryFailedException: [.marvel-2015.10.05][0]: Recovery failed from [DEV_MASTER][F_H_SMI5T4imEDleZ4FZxg][dayrhebfmd001.enterprisenet.org][inet[/localhost:9300]]{max_local_storage_nodes=1, master=true} into [DEV_DATA][muhubm9FSsevgdnyQJTb0Q][dayrhebfmd001.enterprisenet.org][inet[dayrhebfmd001.enterprisenet.org/localhost:9260]]{max_local_storage_nodes=1, master=false}
at org.elasticsearch.indices.recovery.RecoveryTarget.doRecovery(RecoveryTarget.java:280)
at org.elasticsearch.indices.recovery.RecoveryTarget.access$700(RecoveryTarget.java:70)
at org.elasticsearch.indices.recovery.RecoveryTarget$RecoveryRunner.doRun(RecoveryTarget.java:567)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:36)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.transport.RemoteTransportException: [DEV_MASTER][inet[/localhost:9300]][internal:index/shard/recovery/start_recovery]
Caused by: java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:714)
at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:950)
at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1368)
See the below screenshot of log details of data node and please guide us to resolve this issue as soon as possible.
Thanks,
Ganeshbabu R