We have vertically scaled ElasticSearch in our cluster - 3 nodes (which seems to be working through performance testing) but when we attempt to add Marvel to this we seem to be getting OutofMemoryError exceptions. We currently have set marvel.agent.exporter.es.hosts in all node configurations but have not seem to be able to solve this issue. Any ideas on how to fix this (error log below)?
ElasticSearch Version - 1.5.2
Marvel Version - 1.3
Heap Size (per node) - 7GB
Open Files (per node) - 65535
Memlock (per node) - unlimited
[2015-11-23 14:14:27,941][INFO ][cluster.service ] [-es3] detected_master [-es1][m5Ds1kUQQI2ZKelWJax8OA][SVCentral1][inet[/166.17.49.8:9300]]{master=true}, added {[-es1][m5Ds1kUQQI2ZKelWJax8OA][SVCentral1][inet[/166.17.49.8:9300]]{master=true},[-es2][nB6Y4WA2S0ynzGQ-C0Vclg][SVCentral1][inet[/166.17.49.8:9301]]{master=true},}, reason: zen-disco-receive(from master [[-es1][m5Ds1kUQQI2ZKelWJax8OA][SVCentral1][inet[/166.17.49.8:9300]]{master=true}])
[2015-11-23 14:14:27,955][INFO ][marvel.agent.exporter ] [-es3] hosts set to [localhost:9200]
[2015-11-23 14:14:28,043][INFO ][http ] [-es3] bound_address {inet[/0:0:0:0:0:0:0:0:9202]}, publish_address {inet[/166.17.49.8:9202]}
[2015-11-23 14:14:28,043][INFO ][node ] [-es3] started
[2015-11-23 14:23:39,551][WARN ][index.engine ] [-es2] [.marvel-2015.11.23][0] failed engine [out of memory]
java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:714)
at org.apache.lucene.index.ConcurrentMergeScheduler.merge(ConcurrentMergeScheduler.java:391)
at org.elasticsearch.index.merge.EnableMergeScheduler.merge(EnableMergeScheduler.java:50)
at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1985)
at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1979)
at org.elasticsearch.index.engine.InternalEngine.maybeMerge(InternalEngine.java:741)
at org.elasticsearch.index.shard.IndexShard$EngineMerger$1.run(IndexShard.java:1148)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
[2015-11-23 14:23:39,554][WARN ][index.shard ] [-es2] [.marvel-2015.11.23][0] Failed to perform scheduled engine optimize/merge
org.elasticsearch.index.engine.OptimizeFailedEngineException: [.marvel-2015.11.23][0] force merge failed
at org.elasticsearch.index.engine.InternalEngine.maybeMerge(InternalEngine.java:744)
at org.elasticsearch.index.shard.IndexShard$EngineMerger$1.run(IndexShard.java:1148)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:714)
at org.apache.lucene.index.ConcurrentMergeScheduler.merge(ConcurrentMergeScheduler.java:391)
at org.elasticsearch.index.merge.EnableMergeScheduler.merge(EnableMergeScheduler.java:50)
at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1985)
at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1979)
at org.elasticsearch.index.engine.InternalEngine.maybeMerge(InternalEngine.java:741)
... 4 more
So basically I ran into a similar with this in performance testing (where we are pushing 2GBPS through Logstash over a 12 hour period with a two hour spike to 4GBPS) while trying to search the cluster. Seems like the same issue with the es3 throwing an out of memory exception.
I have 3 node (es1, es2, es3) system (vertically scaled) where all the nodes are data nodes capable of being master nodes. Each node has it separate data, work, configuration, and logging directories/files. I wondering if we are missing a key setting that will help us avoid getting this exception. We are planning to move to doc_values and set the cluster.routing.allocation.same_shard.host: true for the next test.
`[2015-11-24 17:39:50,126][WARN ][monitor.jvm ] [-es3] [gc][young][83323][3329] duration [1.5s], collections [1]/[1.8s], total [1.5s]/[27.6s], memory [4.8gb]->[3gb]/[6.7gb], all_pools {[young] [1.8gb]->[1.9mb]/[1.8gb]}{[survivor] [139.8mb]->[96.3mb]/[232.9mb]}{[old] [2.8gb]->[2.9gb]/[4.7gb]}
[2015-11-24 19:22:35,102][INFO ][monitor.jvm ] [-es3] [gc][young][89485][3840] duration [887ms], collections [1]/[1.8s], total [887ms]/[33.2s], memory [4.8gb]->[3.3gb]/[6.7gb], all_pools {[young] [1.4gb]->[9.9mb]/[1.8gb]}{[survivor] [117.5mb]->[118.6mb]/[232.9mb]}{[old] [3.2gb]->[3.2gb]/[4.7gb]}
[2015-11-24 19:29:32,372][WARN ][index.engine ] [-es3] [domain_metadata-2015-11-25][0] failed engine [out of memory]
java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:714)
at org.apache.lucene.index.ConcurrentMergeScheduler.merge(ConcurrentMergeScheduler.java:391)
at org.elasticsearch.index.merge.EnableMergeScheduler.merge(EnableMergeScheduler.java:50)
at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1985)
at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1979)
at org.elasticsearch.index.engine.InternalEngine.maybeMerge(InternalEngine.java:741)
at org.elasticsearch.index.shard.IndexShard$EngineMerger$1.run(IndexShard.java:1148)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
[2015-11-24 19:29:32,376][WARN ][index.shard ] [-es3] [domain_metadata-2015-11-25][0] Failed to perform scheduled engine optimize/merge
org.elasticsearch.index.engine.OptimizeFailedEngineException: [domain_metadata-2015-11-25][0] force merge failed
at org.elasticsearch.index.engine.InternalEngine.maybeMerge(InternalEngine.java:744)
at org.elasticsearch.index.shard.IndexShard$EngineMerger$1.run(IndexShard.java:1148)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:714)
at org.apache.lucene.index.ConcurrentMergeScheduler.merge(ConcurrentMergeScheduler.java:391)
at org.elasticsearch.index.merge.EnableMergeScheduler.merge(EnableMergeScheduler.java:50)
at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1985)
at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1979)
at org.elasticsearch.index.engine.InternalEngine.maybeMerge(InternalEngine.java:741)
... 4 more
[2015-11-24 19:29:32,376][WARN ][indices.cluster ] [-es3] [[domain_metadata-2015-11-25][0]] marking and sending shard failed due to [engine failure, reason [out of memory]]
java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:714)
at org.apache.lucene.index.ConcurrentMergeScheduler.merge(ConcurrentMergeScheduler.java:391)
at org.elasticsearch.index.merge.EnableMergeScheduler.merge(EnableMergeScheduler.java:50)
at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1985)
at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1979)
at org.elasticsearch.index.engine.InternalEngine.maybeMerge(InternalEngine.java:741)
at org.elasticsearch.index.shard.IndexShard$EngineMerger$1.run(IndexShard.java:1148)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
[2015-11-24 19:29:32,472][WARN ][indices.cluster ] [-es3] [[dnsmon-2015-11-24][0]] marking and sending shard failed due to [failed recovery]
org.elasticsearch.indices.recovery.RecoveryFailedException: [dnsmon-2015-11-24][0]: Recovery failed from [-es1][xMkrv9zTQGyED8WFTWx5vA][SVCentral1][inet[/166.17.49.8:9300]]{master=true} into [-es3][c39LaO3ETVGh2wmxLhQ2yw][SVCentral1][inet[/166.17.49.8:9302]]{master=true}
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.