OutOfMemoryError (Java heap space) during replication enabling on 90 indices

We're attempting to create a new Elasticsearch cluster for indexing URLs, but have run into a memory leak when turning replication on for our indices.

The current setup is: 5 x m2.2xlarge, 4 TB mounted on EBS per node (not Provisioned IOPs).

We create one index per day, and will keep the past 90 days around for searching. We have been been performing bulk inserts with routing enabled, 1 day at a time, and have been successful in loading all 90 days. This ended up being approximately 313 million documents. I had inserted with the number of replicas per index set to 0 to increase our bulk insertion rate.
I then started changing the number of replicas per index to 1, one index at a time. I was able to successfully create the replicas for about 70 of the shards (i.e. about 65 or 70 days), but then ran out of heap space.

We are planning to bulk insert about 2-4 millions records per day in 10 minute intervals, so I would appreciate any advice on the validity of our configuration so far. In particular, we would like to know if there's any known memory leaks with shard replication or bulk inserts.

Our configuration:

Ubuntu 12.04 LTS
Java 7 u51 (I am aware of https://groups.google.com/forum/#!msg/elasticsearch/D4WNQZSvqSU/zo7ancelKi4J and am doing a rolling restart of the cluster as we speak to move to Java 7 u25).
Marvel was installed on each node, but in order to simplify our setup, I will be removing it during the aforementioned cluster restart.

Elasticsearch 1.0.0

"version" : {
"number" : "1.0.0",
"build_hash" : "a46900e9c72c0a623d71b54016357d5f94c8ea32",
"build_timestamp" : "2014-02-12T16:18:34Z",
"build_snapshot" : false,
"lucene_version" : "4.6"
},

Settings applied for our bulk insert:

{
"index" : {
"merge.policy.max_merge_at_once" : 4,
"merge.policy.segments_per_tier" : 20,
"refresh_interval" : "-1" # I will be setting this back to 1s when our backfill/replicas are done
}
}

{
"transient" : {
"index.merge.policy.merge_factor" : 30,
"threadpool.bulk.queue_size" : -1,
"index.merge.scheduler.max_thread_count" : 5
}
}

Our Java configuration variables (those that are different from the default /etc/default/elasticsearch in the .deb):

JAVA_HOME=/usr/lib/jvm/java-1.7.0_25-oracle (this was Oracle's Java 7 u51, being backed down during the restart)
ES_HEAP_SIZE=18g
MAX_OPEN_FILES=256000

From a running instance:

/usr/lib/jvm/java-1.7.0_25-oracle/bin/java -Xms18g -Xmx18g -Xss256k -Djava.awt.headless=true -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -Delasticsearch -Des.pidfile=/var/run/elasticsearch.pid -Des.path.home=/usr/share/elasticsearch -cp :/usr/share/elasticsearch/lib/elasticsearch-1.0.0.jar:/usr/share/elasticsearch/lib/:/usr/share/elasticsearch/lib/sigar/ -Des.default.config=/etc/elasticsearch/elasticsearch.yml -Des.default.path.home=/usr/share/elasticsearch -Des.default.path.logs=/var/log/elasticsearch -Des.default.path.data=/var/lib/elasticsearch -Des.default.path.work=/tmp/elasticsearch -Des.default.path.conf=/etc/elasticsearch org.elasticsearch.bootstrap.Elasticsearch

The log message I saw during the OutOfMemoryError:

[2014-04-09 14:17:28,393][WARN ][cluster.action.shard ] [esearch16] [.marvel-2014.04.09][0] received shard failed for [.marvel-2014.04.09][0], node[SK5okikgSWSdbrQdZWET8g], [R], s[INITIALIZING], in
dexUUID [K4IB1Px3RoOqcjPbta-fKw], reason [Failed to start shard, message [RecoveryFailedException[[.marvel-2014.04.09][0]: Recovery failed from [esearch16][EbYQ9HNzQtexkEZ1PgwpnQ][esearch16.tlys.us][in
et[/10.145.167.184:9300]] into [esearch13][SK5okikgSWSdbrQdZWET8g][esearch13.tlys.us][inet[ip-10-185-195-69.ec2.internal/10.185.195.69:9300]]]; nested: RemoteTransportException[[esearch16][inet[/10.145
.167.184:9300]][index/shard/recovery/startRecovery]]; nested: RecoveryEngineException[[.marvel-2014.04.09][0] Phase[2] Execution failed]; nested: RemoteTransportException[[esearch13][inet[/10.185.195.6
9:9300]][index/shard/recovery/prepareTranslog]]; nested: EngineCreationFailureException[[.marvel-2014.04.09][0] failed to create engine]; nested: LockObtainFailedException[Lock obtain timed out: Native
FSLock@/ebsmnt/data/elasticsearch/search-prod/nodes/0/indices/.marvel-2014.04.09/0/index/write.lock]; ]]
[2014-04-09 14:24:47,111][WARN ][cluster.action.shard ] [esearch16] [domain_url_2014-01-03][4] received shard failed for [domain_url_2014-01-03][4], node[SK5okikgSWSdbrQdZWET8g], [R], s[STARTED], i
ndexUUID [OkUadV5JSJGI2B9dKwgMLw], reason [engine failure, message [OutOfMemoryError[Java heap space]]]
[2014-04-09 14:26:06,104][WARN ][cluster.action.shard ] [esearch16] [.marvel-2014.04.09][0] received shard failed for [.marvel-2014.04.09][0], node[SK5okikgSWSdbrQdZWET8g], [R], s[INITIALIZING], in
dexUUID [K4IB1Px3RoOqcjPbta-fKw], reason [Failed to start shard, message [RecoveryFailedException[[.marvel-2014.04.09][0]: Recovery failed from [esearch16][EbYQ9HNzQtexkEZ1PgwpnQ][esearch16.tlys.us][in
et[/10.145.167.184:9300]] into [esearch13][SK5okikgSWSdbrQdZWET8g][esearch13.tlys.us][inet[ip-10-185-195-69.ec2.internal/10.185.195.69:9300]]]; nested: RemoteTransportException[[esearch16][inet[/10.145
.167.184:9300]][index/shard/recovery/startRecovery]]; nested: RecoveryEngineException[[.marvel-2014.04.09][0] Phase[2] Execution failed]; nested: RemoteTransportException[[esearch13][inet[/10.185.195.6
9:9300]][index/shard/recovery/prepareTranslog]]; nested: EngineCreationFailureException[[.marvel-2014.04.09][0] failed to create engine]; nested: LockObtainFailedException[Lock obtain timed out: Native
FSLock@/ebsmnt/data/elasticsearch/search-prod/nodes/0/indices/.marvel-2014.04.09/0/index/write.lock]; ]]
[2014-04-09 14:26:48,562][INFO ][cluster.metadata ] [esearch16] updating number_of_replicas to [0] for indices [.marvel-2014.04.09]
[2014-04-09 14:27:27,235][INFO ][cluster.metadata ] [esearch16] updating number_of_replicas to [0] for indices [.marvel-2014.04.09]
[2014-04-09 14:37:01,359][INFO ][cluster.metadata ] [esearch16] [.marvel-2014.04.09] update_mapping [shard_event] (dynamic)
[2014-04-09 14:37:01,531][INFO ][cluster.metadata ] [esearch16] [.marvel-2014.04.09] update_mapping [routing_event] (dynamic)
[2014-04-09 14:40:51,469][WARN ][cluster.action.shard ] [esearch16] [domain_url_2014-01-01][2] received shard failed for [domain_url_2014-01-01][2], node[BKCZOztRRP6FXVKJSkT_oA], [R], s[STARTED], i
ndexUUID [jDcZyjUrSW6eD3_TH5v0_Q], reason [engine failure, message [OutOfMemoryError[Java heap space]]]
[2014-04-09 14:41:00,353][WARN ][cluster.action.shard ] [esearch16] [domain_url_2014-03-11][2] received shard failed for [domain_url_2014-03-11][2], node[SK5okikgSWSdbrQdZWET8g], [R], s[STARTED], i
ndexUUID [HuQzTDCmTMeS3He3DumnOg], reason [engine failure, message [OutOfMemoryError[Java heap space]]]
[2014-04-09 15:04:32,504][WARN ][cluster.action.shard ] [esearch16] [domain_url_2014-01-03][2] received shard failed for [domain_url_2014-01-03][2], node[BKCZOztRRP6FXVKJSkT_oA], [R], s[STARTED], i
ndexUUID [OkUadV5JSJGI2B9dKwgMLw], reason [engine failure, message [OutOfMemoryError[Java heap space]]]
[2014-04-09 15:12:13,529][WARN ][cluster.action.shard ] [esearch16] [domain_url_2014-01-01][2] received shard failed for [domain_url_2014-01-01][2], node[BKCZOztRRP6FXVKJSkT_oA], [R], s[STARTED], i
ndexUUID [jDcZyjUrSW6eD3_TH5v0_Q], reason [engine failure, message [OutOfMemoryError[Java heap space]]]
[2014-04-09 15:39:24,021][WARN ][cluster.action.shard ] [esearch16] [domain_url_2014-01-03][1] received shard failed for [domain_url_2014-01-03][1], node[4ft2nd1lRE-BdvL2iYGIkg], relocating [BKCZOz
tRRP6FXVKJSkT_oA], [R], s[INITIALIZING], indexUUID [OkUadV5JSJGI2B9dKwgMLw], reason [Failed to start shard, message [RecoveryFailedException[[domain_url_2014-01-03][1]: Recovery failed from [esearch15]
[EkR2xgpURrunkxrRnpkzYQ][esearch15.tlys.us][inet[ip-10-185-171-146.ec2.internal/10.185.171.146:9300]] into [esearch14][4ft2nd1lRE-BdvL2iYGIkg][esearch14.tlys.us][inet[ip-10-184-39-23.ec2.internal/10.18
4.39.23:9300]]]; nested: RemoteTransportException[[esearch15][inet[/10.185.171.146:9300]][index/shard/recovery/startRecovery]]; nested: RecoveryEngineException[[domain_url_2014-01-03][1] Phase[2] Execu
tion failed]; nested: RemoteTransportException[[esearch14][inet[/10.184.39.23:9300]][index/shard/recovery/prepareTranslog]]; nested: OutOfMemoryError[Java heap space]; ]]
[2014-04-09 15:42:51,176][WARN ][cluster.action.shard ] [esearch16] [domain_url_2014-01-06][1] received shard failed for [domain_url_2014-01-06][1], node[4ft2nd1lRE-BdvL2iYGIkg], [R], s[STARTED], i
ndexUUID [51jdwEMrTGKtTpA90ZjXiQ], reason [engine failure, message [OutOfMemoryError[Java heap space]]]
[2014-04-09 15:54:42,711][DEBUG][action.admin.cluster.stats] [esearch16] failed to execute on node [4ft2nd1lRE-BdvL2iYGIkg]
org.elasticsearch.transport.RemoteTransportException: [esearch14][inet[/10.184.39.23:9300]][cluster/stats/n]
Caused by: org.elasticsearch.index.engine.EngineClosedException: [domain_url_2014-01-01][1] CurrentState[CLOSED]
at org.elasticsearch.index.engine.internal.InternalEngine.ensureOpen(InternalEngine.java:913)
at org.elasticsearch.index.engine.internal.InternalEngine.segmentsStats(InternalEngine.java:1130)
at org.elasticsearch.index.shard.service.InternalIndexShard.segmentStats(InternalIndexShard.java:532)
at org.elasticsearch.action.admin.indices.stats.CommonStats.(CommonStats.java:161)
at org.elasticsearch.action.admin.indices.stats.ShardStats.(ShardStats.java:49)
at org.elasticsearch.action.admin.cluster.stats.TransportClusterStatsAction.nodeOperation(TransportClusterStatsAction.java:130)
at org.elasticsearch.action.admin.cluster.stats.TransportClusterStatsAction.nodeOperation(TransportClusterStatsAction.java:54)
at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$NodeTransportHandler.messageReceived(TransportNodesOperationAction.java:281)
at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$NodeTransportHandler.messageReceived(TransportNodesOperationAction.java:272)
at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:270)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: java.lang.OutOfMemoryError: Java heap space
at org.apache.lucene.util.fst.BytesStore.(BytesStore.java:62)
at org.apache.lucene.util.fst.FST.(FST.java:366)
at org.apache.lucene.util.fst.FST.(FST.java:301)
at org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader.(BlockTreeTermsReader.java:481)
at org.apache.lucene.codecs.BlockTreeTermsReader.(BlockTreeTermsReader.java:175)
at org.apache.lucene.codecs.lucene41.Lucene41PostingsFormat.fieldsProducer(Lucene41PostingsFormat.java:437)
at org.elasticsearch.index.codec.postingsformat.BloomFilterPostingsFormat$BloomFilteredFieldsProducer.(BloomFilterPostingsFormat.java:131)
at org.elasticsearch.index.codec.postingsformat.BloomFilterPostingsFormat.fieldsProducer(BloomFilterPostingsFormat.java:102)
at org.elasticsearch.index.codec.postingsformat.Elasticsearch090PostingsFormat.fieldsProducer(Elasticsearch090PostingsFormat.java:79)
at org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsReader.(PerFieldPostingsFormat.java:195)
at org.apache.lucene.codecs.perfield.PerFieldPostingsFormat.fieldsProducer(PerFieldPostingsFormat.java:244)
at org.apache.lucene.index.SegmentCoreReaders.(SegmentCoreReaders.java:115)
at org.apache.lucene.index.SegmentReader.(SegmentReader.java:95)
at org.apache.lucene.index.ReadersAndUpdates.getReader(ReadersAndUpdates.java:141)
at org.apache.lucene.index.ReadersAndUpdates.getReadOnlyClone(ReadersAndUpdates.java:235)
at org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:100)
at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:382)
at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:111)
at org.apache.lucene.search.XSearcherManager.(XSearcherManager.java:94)
at org.elasticsearch.index.engine.internal.InternalEngine.buildSearchManager(InternalEngine.java:1462)
at org.elasticsearch.index.engine.internal.InternalEngine.flush(InternalEngine.java:801)
at org.elasticsearch.index.engine.internal.InternalEngine.updateIndexingBufferSize(InternalEngine.java:223)
at org.elasticsearch.indices.memory.IndexingMemoryController$ShardsIndicesStatusChecker.run(IndexingMemoryController.java:201)
at org.elasticsearch.threadpool.ThreadPool$LoggingRunnable.run(ThreadPool.java:437)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
... 3 more
[2014-04-09 15:54:51,827][WARN ][cluster.action.shard ] [esearch16] [domain_url_2014-01-01][1] received shard failed for [domain_url_2014-01-01][1], node[4ft2nd1lRE-BdvL2iYGIkg], [R], s[STARTED], indexUUID [jDcZyjUrSW6eD3_TH5v0_Q], reason [engine failure, message [OutOfMemoryError[Java heap space]]]

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d2d66060-d0ac-49bf-b9ab-f4157ac3a4d5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

As well, the uncommented sections of our elasticsearch.yml:

bootstrap.mlockall: true
gateway.type: local
gateway.recover_after_nodes: 4
gateway.recover_after_time: 5m
gateway.expected_nodes: 4
discovery.zen.minimum_master_nodes: 2
discovery.zen.ping.timeout: 20s
discovery.zen.ping.multicast.enabled: false

index.search.slowlog.threshold.query.warn: 10s
index.search.slowlog.threshold.query.info: 5s
index.search.slowlog.threshold.query.debug: 2s
index.search.slowlog.threshold.query.trace: 500ms

index.indexing.slowlog.threshold.index.warn: 10s
index.indexing.slowlog.threshold.index.info: 5s
index.indexing.slowlog.threshold.index.debug: 2s
index.indexing.slowlog.threshold.index.trace: 500ms

threadpool:
bulk:
type: fixed
min: 1
size: 30
wait_time: 30s
queue_size: -1
index:
type: fixed
min: 1
size: 30
wait_time: 30s
queue_size: -1

discovery.zen.fd.ping_interval: 10s
discovery.zen.fd.ping_timeout: 60s
discovery.zen.fd.ping_retries: 10

index.translog.flush_threshold_ops: 20000
index.translog.flush_threshold_size: 400mb
index.translog.flush_threshold_period: 60m

On Wednesday, April 9, 2014 5:02:11 PM UTC-4, Jesse Davis wrote:

We're attempting to create a new Elasticsearch cluster for indexing URLs, but have run into a memory leak when turning replication on for our indices.

The current setup is: 5 x m2.2xlarge, 4 TB mounted on EBS per node (not Provisioned IOPs).

We create one index per day, and will keep the past 90 days around for searching. We have been been performing bulk inserts with routing enabled, 1 day at a time, and have been successful in loading all 90 days. This ended up being approximately 313 million documents. I had inserted with the number of replicas per index set to 0 to increase our bulk insertion rate.
I then started changing the number of replicas per index to 1, one index at a time. I was able to successfully create the replicas for about 70 of the shards (i.e. about 65 or 70 days), but then ran out of heap space.

We are planning to bulk insert about 2-4 millions records per day in 10 minute intervals, so I would appreciate any advice on the validity of our configuration so far. In particular, we would like to know if there's any known memory leaks with shard replication or bulk inserts.

Our configuration:

Ubuntu 12.04 LTS
Java 7 u51 (I am aware of Redirecting to Google Groups and am doing a rolling restart of the cluster as we speak to move to Java 7 u25).
Marvel was installed on each node, but in order to simplify our setup, I will be removing it during the aforementioned cluster restart.

Elasticsearch 1.0.0

"version" : {
"number" : "1.0.0",
"build_hash" : "a46900e9c72c0a623d71b54016357d5f94c8ea32",
"build_timestamp" : "2014-02-12T16:18:34Z",
"build_snapshot" : false,
"lucene_version" : "4.6"
},

Settings applied for our bulk insert:

{
"index" : {
"merge.policy.max_merge_at_once" : 4,
"merge.policy.segments_per_tier" : 20,
"refresh_interval" : "-1" # I will be setting this back to 1s when our backfill/replicas are done
}
}

{
"transient" : {
"index.merge.policy.merge_factor" : 30,
"threadpool.bulk.queue_size" : -1,
"index.merge.scheduler.max_thread_count" : 5
}
}

Our Java configuration variables (those that are different from the default /etc/default/elasticsearch in the .deb):

JAVA_HOME=/usr/lib/jvm/java-1.7.0_25-oracle (this was Oracle's Java 7 u51, being backed down during the restart)
ES_HEAP_SIZE=18g
MAX_OPEN_FILES=256000

From a running instance:

/usr/lib/jvm/java-1.7.0_25-oracle/bin/java -Xms18g -Xmx18g -Xss256k -Djava.awt.headless=true -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -Delasticsearch -Des.pidfile=/var/run/elasticsearch.pid -Des.path.home=/usr/share/elasticsearch -cp :/usr/share/elasticsearch/lib/elasticsearch-1.0.0.jar:/usr/share/elasticsearch/lib/:/usr/share/elasticsearch/lib/sigar/ -Des.default.config=/etc/elasticsearch/elasticsearch.yml -Des.default.path.home=/usr/share/elasticsearch -Des.default.path.logs=/var/log/elasticsearch -Des.default.path.data=/var/lib/elasticsearch -Des.default.path.work=/tmp/elasticsearch -Des.default.path.conf=/etc/elasticsearch org.elasticsearch.bootstrap.Elasticsearch

The log message I saw during the OutOfMemoryError:

[2014-04-09 14:17:28,393][WARN ][cluster.action.shard ] [esearch16] [.marvel-2014.04.09][0] received shard failed for [.marvel-2014.04.09][0], node[SK5okikgSWSdbrQdZWET8g], [R], s[INITIALIZING], in
dexUUID [K4IB1Px3RoOqcjPbta-fKw], reason [Failed to start shard, message [RecoveryFailedException[[.marvel-2014.04.09][0]: Recovery failed from [esearch16][EbYQ9HNzQtexkEZ1PgwpnQ][esearch16.tlys.us][in
et[/10.145.167.184:9300]] into [esearch13][SK5okikgSWSdbrQdZWET8g][esearch13.tlys.us][inet[ip-10-185-195-69.ec2.internal/10.185.195.69:9300]]]; nested: RemoteTransportException[[esearch16][inet[/10.145
.167.184:9300]][index/shard/recovery/startRecovery]]; nested: RecoveryEngineException[[.marvel-2014.04.09][0] Phase[2] Execution failed]; nested: RemoteTransportException[[esearch13][inet[/10.185.195.6
9:9300]][index/shard/recovery/prepareTranslog]]; nested: EngineCreationFailureException[[.marvel-2014.04.09][0] failed to create engine]; nested: LockObtainFailedException[Lock obtain timed out: Native
FSLock@/ebsmnt/data/elasticsearch/search-prod/nodes/0/indices/.marvel-2014.04.09/0/index/write.lock]; ]]
[2014-04-09 14:24:47,111][WARN ][cluster.action.shard ] [esearch16] [domain_url_2014-01-03][4] received shard failed for [domain_url_2014-01-03][4], node[SK5okikgSWSdbrQdZWET8g], [R], s[STARTED], i
ndexUUID [OkUadV5JSJGI2B9dKwgMLw], reason [engine failure, message [OutOfMemoryError[Java heap space]]]
[2014-04-09 14:26:06,104][WARN ][cluster.action.shard ] [esearch16] [.marvel-2014.04.09][0] received shard failed for [.marvel-2014.04.09][0], node[SK5okikgSWSdbrQdZWET8g], [R], s[INITIALIZING], in
dexUUID [K4IB1Px3RoOqcjPbta-fKw], reason [Failed to start shard, message [RecoveryFailedException[[.marvel-2014.04.09][0]: Recovery failed from [esearch16][EbYQ9HNzQtexkEZ1PgwpnQ][esearch16.tlys.us][in
et[/10.145.167.184:9300]] into [esearch13][SK5okikgSWSdbrQdZWET8g][esearch13.tlys.us][inet[ip-10-185-195-69.ec2.internal/10.185.195.69:9300]]]; nested: RemoteTransportException[[esearch16][inet[/10.145
.167.184:9300]][index/shard/recovery/startRecovery]]; nested: RecoveryEngineException[[.marvel-2014.04.09][0] Phase[2] Execution failed]; nested: RemoteTransportException[[esearch13][inet[/10.185.195.6
9:9300]][index/shard/recovery/prepareTranslog]]; nested: EngineCreationFailureException[[.marvel-2014.04.09][0] failed to create engine]; nested: LockObtainFailedException[Lock obtain timed out: Native
FSLock@/ebsmnt/data/elasticsearch/search-prod/nodes/0/indices/.marvel-2014.04.09/0/index/write.lock]; ]]
[2014-04-09 14:26:48,562][INFO ][cluster.metadata ] [esearch16] updating number_of_replicas to [0] for indices [.marvel-2014.04.09]
[2014-04-09 14:27:27,235][INFO ][cluster.metadata ] [esearch16] updating number_of_replicas to [0] for indices [.marvel-2014.04.09]
[2014-04-09 14:37:01,359][INFO ][cluster.metadata ] [esearch16] [.marvel-2014.04.09] update_mapping [shard_event] (dynamic)
[2014-04-09 14:37:01,531][INFO ][cluster.metadata ] [esearch16] [.marvel-2014.04.09] update_mapping [routing_event] (dynamic)
[2014-04-09 14:40:51,469][WARN ][cluster.action.shard ] [esearch16] [domain_url_2014-01-01][2] received shard failed for [domain_url_2014-01-01][2], node[BKCZOztRRP6FXVKJSkT_oA], [R], s[STARTED], i
ndexUUID [jDcZyjUrSW6eD3_TH5v0_Q], reason [engine failure, message [OutOfMemoryError[Java heap space]]]
[2014-04-09 14:41:00,353][WARN ][cluster.action.shard ] [esearch16] [domain_url_2014-03-11][2] received shard failed for [domain_url_2014-03-11][2], node[SK5okikgSWSdbrQdZWET8g], [R], s[STARTED], i
ndexUUID [HuQzTDCmTMeS3He3DumnOg], reason [engine failure, message [OutOfMemoryError[Java heap space]]]
[2014-04-09 15:04:32,504][WARN ][cluster.action.shard ] [esearch16] [domain_url_2014-01-03][2] received shard failed for [domain_url_2014-01-03][2], node[BKCZOztRRP6FXVKJSkT_oA], [R], s[STARTED], i
ndexUUID [OkUadV5JSJGI2B9dKwgMLw], reason [engine failure, message [OutOfMemoryError[Java heap space]]]
[2014-04-09 15:12:13,529][WARN ][cluster.action.shard ] [esearch16] [domain_url_2014-01-01][2] received shard failed for [domain_url_2014-01-01][2], node[BKCZOztRRP6FXVKJSkT_oA], [R], s[STARTED], i
ndexUUID [jDcZyjUrSW6eD3_TH5v0_Q], reason [engine failure, message [OutOfMemoryError[Java heap space]]]
[2014-04-09 15:39:24,021][WARN ][cluster.action.shard ] [esearch16] [domain_url_2014-01-03][1] received shard failed for [domain_url_2014-01-03][1], node[4ft2nd1lRE-BdvL2iYGIkg], relocating [BKCZOz
tRRP6FXVKJSkT_oA], [R], s[INITIALIZING], indexUUID [OkUadV5JSJGI2B9dKwgMLw], reason [Failed to start shard, message [RecoveryFailedException[[domain_url_2014-01-03][1]: Recovery failed from [esearch15]
[EkR2xgpURrunkxrRnpkzYQ][esearch15.tlys.us][inet[ip-10-185-171-146.ec2.internal/10.185.171.146:9300]] into [esearch14][4ft2nd1lRE-BdvL2iYGIkg][esearch14.tlys.us][inet[ip-10-184-39-23.ec2.internal/10.18
4.39.23:9300]]]; nested: RemoteTransportException[[esearch15][inet[/10.185.171.146:9300]][index/shard/recovery/startRecovery]]; nested: RecoveryEngineException[[domain_url_2014-01-03][1] Phase[2] Execu
tion failed]; nested: RemoteTransportException[[esearch14][inet[/10.184.39.23:9300]][index/shard/recovery/prepareTranslog]]; nested: OutOfMemoryError[Java heap space]; ]]
[2014-04-09 15:42:51,176][WARN ][cluster.action.shard ] [esearch16] [domain_url_2014-01-06][1] received shard failed for [domain_url_2014-01-06][1], node[4ft2nd1lRE-BdvL2iYGIkg], [R], s[STARTED], i
ndexUUID [51jdwEMrTGKtTpA90ZjXiQ], reason [engine failure, message [OutOfMemoryError[Java heap space]]]
[2014-04-09 15:54:42,711][DEBUG][action.admin.cluster.stats] [esearch16] failed to execute on node [4ft2nd1lRE-BdvL2iYGIkg]
org.elasticsearch.transport.RemoteTransportException: [esearch14][inet[/10.184.39.23:9300]][cluster/stats/n]
Caused by: org.elasticsearch.index.engine.EngineClosedException: [domain_url_2014-01-01][1] CurrentState[CLOSED]
at org.elasticsearch.index.engine.internal.InternalEngine.ensureOpen(InternalEngine.java:913)
at org.elasticsearch.index.engine.internal.InternalEngine.segmentsStats(InternalEngine.java:1130)
at org.elasticsearch.index.shard.service.InternalIndexShard.segmentStats(InternalIndexShard.java:532)
at org.elasticsearch.action.admin.indices.stats.CommonStats.(CommonStats.java:161)
at org.elasticsearch.action.admin.indices.stats.ShardStats.(ShardStats.java:49)
at org.elasticsearch.action.admin.cluster.stats.TransportClusterStatsAction.nodeOperation(TransportClusterStatsAction.java:130)
at org.elasticsearch.action.admin.cluster.stats.TransportClusterStatsAction.nodeOperation(TransportClusterStatsAction.java:54)
at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$NodeTransportHandler.messageReceived(TransportNodesOperationAction.java:281)
at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$NodeTransportHandler.messageReceived(TransportNodesOperationAction.java:272)
at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:270)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: java.lang.OutOfMemoryError: Java heap space
at org.apache.lucene.util.fst.BytesStore.(BytesStore.java:62)
at org.apache.lucene.util.fst.FST.(FST.java:366)
at org.apache.lucene.util.fst.FST.(FST.java:301)
at org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader.(BlockTreeTermsReader.java:481)
at org.apache.lucene.codecs.BlockTreeTermsReader.(BlockTreeTermsReader.java:175)
at org.apache.lucene.codecs.lucene41.Lucene41PostingsFormat.fieldsProducer(Lucene41PostingsFormat.java:437)
at org.elasticsearch.index.codec.postingsformat.BloomFilterPostingsFormat$BloomFilteredFieldsProducer.(BloomFilterPostingsFormat.java:131)
at org.elasticsearch.index.codec.postingsformat.BloomFilterPostingsFormat.fieldsProducer(BloomFilterPostingsFormat.java:102)
at org.elasticsearch.index.codec.postingsformat.Elasticsearch090PostingsFormat.fieldsProducer(Elasticsearch090PostingsFormat.java:79)
at org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsReader.(PerFieldPostingsFormat.java:195)
at org.apache.lucene.codecs.perfield.PerFieldPostingsFormat.fieldsProducer(PerFieldPostingsFormat.java:244)
at org.apache.lucene.index.SegmentCoreReaders.(SegmentCoreReaders.java:115)
at org.apache.lucene.index.SegmentReader.(SegmentReader.java:95)
at org.apache.lucene.index.ReadersAndUpdates.getReader(ReadersAndUpdates.java:141)
at org.apache.lucene.index.ReadersAndUpdates.getReadOnlyClone(ReadersAndUpdates.java:235)
at org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:100)
at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:382)
at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:111)
at org.apache.lucene.search.XSearcherManager.(XSearcherManager.java:94)
at org.elasticsearch.index.engine.internal.InternalEngine.buildSearchManager(InternalEngine.java:1462)
at org.elasticsearch.index.engine.internal.InternalEngine.flush(InternalEngine.java:801)
at org.elasticsearch.index.engine.internal.InternalEngine.updateIndexingBufferSize(InternalEngine.java:223)
at org.elasticsearch.indices.memory.IndexingMemoryController$ShardsIndicesStatusChecker.run(IndexingMemoryController.java:201)
at org.elasticsearch.threadpool.ThreadPool$LoggingRunnable.run(ThreadPool.java:437)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
... 3 more
[2014-04-09 15:54:51,827][WARN ][cluster.action.shard ] [esearch16] [domain_url_2014-01-01][1] received shard failed for [domain_url_2014-01-01][1], node[4ft2nd1lRE-BdvL2iYGIkg], [R], s[STARTED], indexUUID [jDcZyjUrSW6eD3_TH5v0_Q], reason [engine failure, message [OutOfMemoryError[Java heap space]]]

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/fd602559-ac92-4ef2-b602-70fc409e4b58%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Also, the uncommented portions of our elasticsearch.yml:

bootstrap.mlockall: true
gateway.type: local
gateway.recover_after_nodes: 4
gateway.recover_after_time: 5m
gateway.expected_nodes: 4
indices.recovery.max_size_per_sec: 500mb
discovery.zen.minimum_master_nodes: 2
discovery.zen.ping.timeout: 20s
discovery.zen.ping.multicast.enabled: false

index.search.slowlog.threshold.query.warn: 10s
index.search.slowlog.threshold.query.info: 5s
index.search.slowlog.threshold.query.debug: 2s
index.search.slowlog.threshold.query.trace: 500ms

index.indexing.slowlog.threshold.index.warn: 10s
index.indexing.slowlog.threshold.index.info: 5s
index.indexing.slowlog.threshold.index.debug: 2s
index.indexing.slowlog.threshold.index.trace: 500ms

threadpool:
bulk:
type: fixed
min: 1
size: 30
wait_time: 30s
queue_size: -1
index:
type: fixed
min: 1
size: 30
wait_time: 30s
queue_size: -1

discovery.zen.fd.ping_interval: 10s
discovery.zen.fd.ping_timeout: 60s
discovery.zen.fd.ping_retries: 10

index.translog.flush_threshold_ops: 20000
index.translog.flush_threshold_size: 400mb
index.translog.flush_threshold_period: 60m

On Wednesday, April 9, 2014 5:02:11 PM UTC-4, Jesse Davis wrote:

We're attempting to create a new Elasticsearch cluster for indexing URLs, but have run into a memory leak when turning replication on for our indices.

The current setup is: 5 x m2.2xlarge, 4 TB mounted on EBS per node (not Provisioned IOPs).

We create one index per day, and will keep the past 90 days around for searching. We have been been performing bulk inserts with routing enabled, 1 day at a time, and have been successful in loading all 90 days. This ended up being approximately 313 million documents. I had inserted with the number of replicas per index set to 0 to increase our bulk insertion rate.
I then started changing the number of replicas per index to 1, one index at a time. I was able to successfully create the replicas for about 70 of the shards (i.e. about 65 or 70 days), but then ran out of heap space.

We are planning to bulk insert about 2-4 millions records per day in 10 minute intervals, so I would appreciate any advice on the validity of our configuration so far. In particular, we would like to know if there's any known memory leaks with shard replication or bulk inserts.

Our configuration:

Ubuntu 12.04 LTS
Java 7 u51 (I am aware of Redirecting to Google Groups and am doing a rolling restart of the cluster as we speak to move to Java 7 u25).
Marvel was installed on each node, but in order to simplify our setup, I will be removing it during the aforementioned cluster restart.

Elasticsearch 1.0.0

"version" : {
"number" : "1.0.0",
"build_hash" : "a46900e9c72c0a623d71b54016357d5f94c8ea32",
"build_timestamp" : "2014-02-12T16:18:34Z",
"build_snapshot" : false,
"lucene_version" : "4.6"
},

Settings applied for our bulk insert:

{
"index" : {
"merge.policy.max_merge_at_once" : 4,
"merge.policy.segments_per_tier" : 20,
"refresh_interval" : "-1" # I will be setting this back to 1s when our backfill/replicas are done
}
}

{
"transient" : {
"index.merge.policy.merge_factor" : 30,
"threadpool.bulk.queue_size" : -1,
"index.merge.scheduler.max_thread_count" : 5
}
}

Our Java configuration variables (those that are different from the default /etc/default/elasticsearch in the .deb):

JAVA_HOME=/usr/lib/jvm/java-1.7.0_25-oracle (this was Oracle's Java 7 u51, being backed down during the restart)
ES_HEAP_SIZE=18g
MAX_OPEN_FILES=256000

From a running instance:

/usr/lib/jvm/java-1.7.0_25-oracle/bin/java -Xms18g -Xmx18g -Xss256k -Djava.awt.headless=true -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -Delasticsearch -Des.pidfile=/var/run/elasticsearch.pid -Des.path.home=/usr/share/elasticsearch -cp :/usr/share/elasticsearch/lib/elasticsearch-1.0.0.jar:/usr/share/elasticsearch/lib/:/usr/share/elasticsearch/lib/sigar/ -Des.default.config=/etc/elasticsearch/elasticsearch.yml -Des.default.path.home=/usr/share/elasticsearch -Des.default.path.logs=/var/log/elasticsearch -Des.default.path.data=/var/lib/elasticsearch -Des.default.path.work=/tmp/elasticsearch -Des.default.path.conf=/etc/elasticsearch org.elasticsearch.bootstrap.Elasticsearch

The log message I saw during the OutOfMemoryError:

[2014-04-09 14:17:28,393][WARN ][cluster.action.shard ] [esearch16] [.marvel-2014.04.09][0] received shard failed for [.marvel-2014.04.09][0], node[SK5okikgSWSdbrQdZWET8g], [R], s[INITIALIZING], in
dexUUID [K4IB1Px3RoOqcjPbta-fKw], reason [Failed to start shard, message [RecoveryFailedException[[.marvel-2014.04.09][0]: Recovery failed from [esearch16][EbYQ9HNzQtexkEZ1PgwpnQ][esearch16.tlys.us][in
et[/10.145.167.184:9300]] into [esearch13][SK5okikgSWSdbrQdZWET8g][esearch13.tlys.us][inet[ip-10-185-195-69.ec2.internal/10.185.195.69:9300]]]; nested: RemoteTransportException[[esearch16][inet[/10.145
.167.184:9300]][index/shard/recovery/startRecovery]]; nested: RecoveryEngineException[[.marvel-2014.04.09][0] Phase[2] Execution failed]; nested: RemoteTransportException[[esearch13][inet[/10.185.195.6
9:9300]][index/shard/recovery/prepareTranslog]]; nested: EngineCreationFailureException[[.marvel-2014.04.09][0] failed to create engine]; nested: LockObtainFailedException[Lock obtain timed out: Native
FSLock@/ebsmnt/data/elasticsearch/search-prod/nodes/0/indices/.marvel-2014.04.09/0/index/write.lock]; ]]
[2014-04-09 14:24:47,111][WARN ][cluster.action.shard ] [esearch16] [domain_url_2014-01-03][4] received shard failed for [domain_url_2014-01-03][4], node[SK5okikgSWSdbrQdZWET8g], [R], s[STARTED], i
ndexUUID [OkUadV5JSJGI2B9dKwgMLw], reason [engine failure, message [OutOfMemoryError[Java heap space]]]
[2014-04-09 14:26:06,104][WARN ][cluster.action.shard ] [esearch16] [.marvel-2014.04.09][0] received shard failed for [.marvel-2014.04.09][0], node[SK5okikgSWSdbrQdZWET8g], [R], s[INITIALIZING], in
dexUUID [K4IB1Px3RoOqcjPbta-fKw], reason [Failed to start shard, message [RecoveryFailedException[[.marvel-2014.04.09][0]: Recovery failed from [esearch16][EbYQ9HNzQtexkEZ1PgwpnQ][esearch16.tlys.us][in
et[/10.145.167.184:9300]] into [esearch13][SK5okikgSWSdbrQdZWET8g][esearch13.tlys.us][inet[ip-10-185-195-69.ec2.internal/10.185.195.69:9300]]]; nested: RemoteTransportException[[esearch16][inet[/10.145
.167.184:9300]][index/shard/recovery/startRecovery]]; nested: RecoveryEngineException[[.marvel-2014.04.09][0] Phase[2] Execution failed]; nested: RemoteTransportException[[esearch13][inet[/10.185.195.6
9:9300]][index/shard/recovery/prepareTranslog]]; nested: EngineCreationFailureException[[.marvel-2014.04.09][0] failed to create engine]; nested: LockObtainFailedException[Lock obtain timed out: Native
FSLock@/ebsmnt/data/elasticsearch/search-prod/nodes/0/indices/.marvel-2014.04.09/0/index/write.lock]; ]]
[2014-04-09 14:26:48,562][INFO ][cluster.metadata ] [esearch16] updating number_of_replicas to [0] for indices [.marvel-2014.04.09]
[2014-04-09 14:27:27,235][INFO ][cluster.metadata ] [esearch16] updating number_of_replicas to [0] for indices [.marvel-2014.04.09]
[2014-04-09 14:37:01,359][INFO ][cluster.metadata ] [esearch16] [.marvel-2014.04.09] update_mapping [shard_event] (dynamic)
[2014-04-09 14:37:01,531][INFO ][cluster.metadata ] [esearch16] [.marvel-2014.04.09] update_mapping [routing_event] (dynamic)
[2014-04-09 14:40:51,469][WARN ][cluster.action.shard ] [esearch16] [domain_url_2014-01-01][2] received shard failed for [domain_url_2014-01-01][2], node[BKCZOztRRP6FXVKJSkT_oA], [R], s[STARTED], i
ndexUUID [jDcZyjUrSW6eD3_TH5v0_Q], reason [engine failure, message [OutOfMemoryError[Java heap space]]]
[2014-04-09 14:41:00,353][WARN ][cluster.action.shard ] [esearch16] [domain_url_2014-03-11][2] received shard failed for [domain_url_2014-03-11][2], node[SK5okikgSWSdbrQdZWET8g], [R], s[STARTED], i
ndexUUID [HuQzTDCmTMeS3He3DumnOg], reason [engine failure, message [OutOfMemoryError[Java heap space]]]
[2014-04-09 15:04:32,504][WARN ][cluster.action.shard ] [esearch16] [domain_url_2014-01-03][2] received shard failed for [domain_url_2014-01-03][2], node[BKCZOztRRP6FXVKJSkT_oA], [R], s[STARTED], i
ndexUUID [OkUadV5JSJGI2B9dKwgMLw], reason [engine failure, message [OutOfMemoryError[Java heap space]]]
[2014-04-09 15:12:13,529][WARN ][cluster.action.shard ] [esearch16] [domain_url_2014-01-01][2] received shard failed for [domain_url_2014-01-01][2], node[BKCZOztRRP6FXVKJSkT_oA], [R], s[STARTED], i
ndexUUID [jDcZyjUrSW6eD3_TH5v0_Q], reason [engine failure, message [OutOfMemoryError[Java heap space]]]
[2014-04-09 15:39:24,021][WARN ][cluster.action.shard ] [esearch16] [domain_url_2014-01-03][1] received shard failed for [domain_url_2014-01-03][1], node[4ft2nd1lRE-BdvL2iYGIkg], relocating [BKCZOz
tRRP6FXVKJSkT_oA], [R], s[INITIALIZING], indexUUID [OkUadV5JSJGI2B9dKwgMLw], reason [Failed to start shard, message [RecoveryFailedException[[domain_url_2014-01-03][1]: Recovery failed from [esearch15]
[EkR2xgpURrunkxrRnpkzYQ][esearch15.tlys.us][inet[ip-10-185-171-146.ec2.internal/10.185.171.146:9300]] into [esearch14][4ft2nd1lRE-BdvL2iYGIkg][esearch14.tlys.us][inet[ip-10-184-39-23.ec2.internal/10.18
4.39.23:9300]]]; nested: RemoteTransportException[[esearch15][inet[/10.185.171.146:9300]][index/shard/recovery/startRecovery]]; nested: RecoveryEngineException[[domain_url_2014-01-03][1] Phase[2] Execu
tion failed]; nested: RemoteTransportException[[esearch14][inet[/10.184.39.23:9300]][index/shard/recovery/prepareTranslog]]; nested: OutOfMemoryError[Java heap space]; ]]
[2014-04-09 15:42:51,176][WARN ][cluster.action.shard ] [esearch16] [domain_url_2014-01-06][1] received shard failed for [domain_url_2014-01-06][1], node[4ft2nd1lRE-BdvL2iYGIkg], [R], s[STARTED], i
ndexUUID [51jdwEMrTGKtTpA90ZjXiQ], reason [engine failure, message [OutOfMemoryError[Java heap space]]]
[2014-04-09 15:54:42,711][DEBUG][action.admin.cluster.stats] [esearch16] failed to execute on node [4ft2nd1lRE-BdvL2iYGIkg]
org.elasticsearch.transport.RemoteTransportException: [esearch14][inet[/10.184.39.23:9300]][cluster/stats/n]
Caused by: org.elasticsearch.index.engine.EngineClosedException: [domain_url_2014-01-01][1] CurrentState[CLOSED]
at org.elasticsearch.index.engine.internal.InternalEngine.ensureOpen(InternalEngine.java:913)
at org.elasticsearch.index.engine.internal.InternalEngine.segmentsStats(InternalEngine.java:1130)
at org.elasticsearch.index.shard.service.InternalIndexShard.segmentStats(InternalIndexShard.java:532)
at org.elasticsearch.action.admin.indices.stats.CommonStats.(CommonStats.java:161)
at org.elasticsearch.action.admin.indices.stats.ShardStats.(ShardStats.java:49)
at org.elasticsearch.action.admin.cluster.stats.TransportClusterStatsAction.nodeOperation(TransportClusterStatsAction.java:130)
at org.elasticsearch.action.admin.cluster.stats.TransportClusterStatsAction.nodeOperation(TransportClusterStatsAction.java:54)
at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$NodeTransportHandler.messageReceived(TransportNodesOperationAction.java:281)
at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$NodeTransportHandler.messageReceived(TransportNodesOperationAction.java:272)
at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:270)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: java.lang.OutOfMemoryError: Java heap space
at org.apache.lucene.util.fst.BytesStore.(BytesStore.java:62)
at org.apache.lucene.util.fst.FST.(FST.java:366)
at org.apache.lucene.util.fst.FST.(FST.java:301)
at org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader.(BlockTreeTermsReader.java:481)
at org.apache.lucene.codecs.BlockTreeTermsReader.(BlockTreeTermsReader.java:175)
at org.apache.lucene.codecs.lucene41.Lucene41PostingsFormat.fieldsProducer(Lucene41PostingsFormat.java:437)
at org.elasticsearch.index.codec.postingsformat.BloomFilterPostingsFormat$BloomFilteredFieldsProducer.(BloomFilterPostingsFormat.java:131)
at org.elasticsearch.index.codec.postingsformat.BloomFilterPostingsFormat.fieldsProducer(BloomFilterPostingsFormat.java:102)
at org.elasticsearch.index.codec.postingsformat.Elasticsearch090PostingsFormat.fieldsProducer(Elasticsearch090PostingsFormat.java:79)
at org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsReader.(PerFieldPostingsFormat.java:195)
at org.apache.lucene.codecs.perfield.PerFieldPostingsFormat.fieldsProducer(PerFieldPostingsFormat.java:244)
at org.apache.lucene.index.SegmentCoreReaders.(SegmentCoreReaders.java:115)
at org.apache.lucene.index.SegmentReader.(SegmentReader.java:95)
at org.apache.lucene.index.ReadersAndUpdates.getReader(ReadersAndUpdates.java:141)
at org.apache.lucene.index.ReadersAndUpdates.getReadOnlyClone(ReadersAndUpdates.java:235)
at org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:100)
at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:382)
at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:111)
at org.apache.lucene.search.XSearcherManager.(XSearcherManager.java:94)
at org.elasticsearch.index.engine.internal.InternalEngine.buildSearchManager(InternalEngine.java:1462)
at org.elasticsearch.index.engine.internal.InternalEngine.flush(InternalEngine.java:801)
at org.elasticsearch.index.engine.internal.InternalEngine.updateIndexingBufferSize(InternalEngine.java:223)
at org.elasticsearch.indices.memory.IndexingMemoryController$ShardsIndicesStatusChecker.run(IndexingMemoryController.java:201)
at org.elasticsearch.threadpool.ThreadPool$LoggingRunnable.run(ThreadPool.java:437)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
... 3 more
[2014-04-09 15:54:51,827][WARN ][cluster.action.shard ] [esearch16] [domain_url_2014-01-01][1] received shard failed for [domain_url_2014-01-01][1], node[4ft2nd1lRE-BdvL2iYGIkg], [R], s[STARTED], indexUUID [jDcZyjUrSW6eD3_TH5v0_Q], reason [engine failure, message [OutOfMemoryError[Java heap space]]]

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a99e9fc6-d0d7-4562-a0c1-bc1b7bce8929%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

I've had another OutOfMemoryError after restarting the cluster and moving
to Java 7u25. I've attached the output of the offending node's log.

On Wed, Apr 9, 2014 at 5:02 PM, Jesse Davis
jesse.michael.davis@gmail.comwrote:

We're attempting to create a new Elasticsearch cluster for indexing URLs, but have run into a memory leak when turning replication on for our indices.

The current setup is: 5 x m2.2xlarge, 4 TB mounted on EBS per node (not Provisioned IOPs).

We create one index per day, and will keep the past 90 days around for searching. We have been been performing bulk inserts with routing enabled, 1 day at a time, and have been successful in loading all 90 days. This ended up being approximately 313 million documents. I had inserted with the number of replicas per index set to 0 to increase our bulk insertion rate.
I then started changing the number of replicas per index to 1, one index at a time. I was able to successfully create the replicas for about 70 of the shards (i.e. about 65 or 70 days), but then ran out of heap space.

We are planning to bulk insert about 2-4 millions records per day in 10 minute intervals, so I would appreciate any advice on the validity of our configuration so far. In particular, we would like to know if there's any known memory leaks with shard replication or bulk inserts.

Our configuration:

Ubuntu 12.04 LTS
Java 7 u51 (I am aware of Redirecting to Google Groups and am doing a rolling restart of the cluster as we speak to move to Java 7 u25).
Marvel was installed on each node, but in order to simplify our setup, I will be removing it during the aforementioned cluster restart.

Elasticsearch 1.0.0

"version" : {
"number" : "1.0.0",
"build_hash" : "a46900e9c72c0a623d71b54016357d5f94c8ea32",
"build_timestamp" : "2014-02-12T16:18:34Z",
"build_snapshot" : false,
"lucene_version" : "4.6"
},

Settings applied for our bulk insert:

{
"index" : {
"merge.policy.max_merge_at_once" : 4,
"merge.policy.segments_per_tier" : 20,
"refresh_interval" : "-1" # I will be setting this back to 1s when our backfill/replicas are done
}
}

{
"transient" : {
"index.merge.policy.merge_factor" : 30,
"threadpool.bulk.queue_size" : -1,
"index.merge.scheduler.max_thread_count" : 5
}
}

Our Java configuration variables (those that are different from the default /etc/default/elasticsearch in the .deb):

JAVA_HOME=/usr/lib/jvm/java-1.7.0_25-oracle (this was Oracle's Java 7 u51, being backed down during the restart)
ES_HEAP_SIZE=18g
MAX_OPEN_FILES=256000

From a running instance:

/usr/lib/jvm/java-1.7.0_25-oracle/bin/java -Xms18g -Xmx18g -Xss256k -Djava.awt.headless=true -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -Delasticsearch -Des.pidfile=/var/run/elasticsearch.pid -Des.path.home=/usr/share/elasticsearch -cp :/usr/share/elasticsearch/lib/elasticsearch-1.0.0.jar:/usr/share/elasticsearch/lib/:/usr/share/elasticsearch/lib/sigar/ -Des.default.config=/etc/elasticsearch/elasticsearch.yml -Des.default.path.home=/usr/share/elasticsearch -Des.default.path.logs=/var/log/elasticsearch -Des.default.path.data=/var/lib/elasticsearch -Des.default.path.work=/tmp/elasticsearch -Des.default.path.conf=/etc/elasticsearch org.elasticsearch.bootstrap.Elasticsearch

The log message I saw during the OutOfMemoryError:

[2014-04-09 14:17:28,393][WARN ][cluster.action.shard ] [esearch16] [.marvel-2014.04.09][0] received shard failed for [.marvel-2014.04.09][0], node[SK5okikgSWSdbrQdZWET8g], [R], s[INITIALIZING], in
dexUUID [K4IB1Px3RoOqcjPbta-fKw], reason [Failed to start shard, message [RecoveryFailedException[[.marvel-2014.04.09][0]: Recovery failed from [esearch16][EbYQ9HNzQtexkEZ1PgwpnQ][esearch16.tlys.us][in
et[/10.145.167.184:9300]] into [esearch13][SK5okikgSWSdbrQdZWET8g][esearch13.tlys.us][inet[ip-10-185-195-69.ec2.internal/10.185.195.69:9300]]]; nested: RemoteTransportException[[esearch16][inet[/10.145
.167.184:9300]][index/shard/recovery/startRecovery]]; nested: RecoveryEngineException[[.marvel-2014.04.09][0] Phase[2] Execution failed]; nested: RemoteTransportException[[esearch13][inet[/10.185.195.6
9:9300]][index/shard/recovery/prepareTranslog]]; nested: EngineCreationFailureException[[.marvel-2014.04.09][0] failed to create engine]; nested: LockObtainFailedException[Lock obtain timed out: Native
FSLock@/ebsmnt/data/elasticsearch/search-prod/nodes/0/indices/.marvel-2014.04.09/0/index/write.lock]; ]]
[2014-04-09 14:24:47,111][WARN ][cluster.action.shard ] [esearch16] [domain_url_2014-01-03][4] received shard failed for [domain_url_2014-01-03][4], node[SK5okikgSWSdbrQdZWET8g], [R], s[STARTED], i
ndexUUID [OkUadV5JSJGI2B9dKwgMLw], reason [engine failure, message [OutOfMemoryError[Java heap space]]]
[2014-04-09 14:26:06,104][WARN ][cluster.action.shard ] [esearch16] [.marvel-2014.04.09][0] received shard failed for [.marvel-2014.04.09][0], node[SK5okikgSWSdbrQdZWET8g], [R], s[INITIALIZING], in
dexUUID [K4IB1Px3RoOqcjPbta-fKw], reason [Failed to start shard, message [RecoveryFailedException[[.marvel-2014.04.09][0]: Recovery failed from [esearch16][EbYQ9HNzQtexkEZ1PgwpnQ][esearch16.tlys.us][in
et[/10.145.167.184:9300]] into [esearch13][SK5okikgSWSdbrQdZWET8g][esearch13.tlys.us][inet[ip-10-185-195-69.ec2.internal/10.185.195.69:9300]]]; nested: RemoteTransportException[[esearch16][inet[/10.145
.167.184:9300]][index/shard/recovery/startRecovery]]; nested: RecoveryEngineException[[.marvel-2014.04.09][0] Phase[2] Execution failed]; nested: RemoteTransportException[[esearch13][inet[/10.185.195.6
9:9300]][index/shard/recovery/prepareTranslog]]; nested: EngineCreationFailureException[[.marvel-2014.04.09][0] failed to create engine]; nested: LockObtainFailedException[Lock obtain timed out: Native
FSLock@/ebsmnt/data/elasticsearch/search-prod/nodes/0/indices/.marvel-2014.04.09/0/index/write.lock]; ]]
[2014-04-09 14:26:48,562][INFO ][cluster.metadata ] [esearch16] updating number_of_replicas to [0] for indices [.marvel-2014.04.09]
[2014-04-09 14:27:27,235][INFO ][cluster.metadata ] [esearch16] updating number_of_replicas to [0] for indices [.marvel-2014.04.09]
[2014-04-09 14:37:01,359][INFO ][cluster.metadata ] [esearch16] [.marvel-2014.04.09] update_mapping [shard_event] (dynamic)
[2014-04-09 14:37:01,531][INFO ][cluster.metadata ] [esearch16] [.marvel-2014.04.09] update_mapping [routing_event] (dynamic)
[2014-04-09 14:40:51,469][WARN ][cluster.action.shard ] [esearch16] [domain_url_2014-01-01][2] received shard failed for [domain_url_2014-01-01][2], node[BKCZOztRRP6FXVKJSkT_oA], [R], s[STARTED], i
ndexUUID [jDcZyjUrSW6eD3_TH5v0_Q], reason [engine failure, message [OutOfMemoryError[Java heap space]]]
[2014-04-09 14:41:00,353][WARN ][cluster.action.shard ] [esearch16] [domain_url_2014-03-11][2] received shard failed for [domain_url_2014-03-11][2], node[SK5okikgSWSdbrQdZWET8g], [R], s[STARTED], i
ndexUUID [HuQzTDCmTMeS3He3DumnOg], reason [engine failure, message [OutOfMemoryError[Java heap space]]]
[2014-04-09 15:04:32,504][WARN ][cluster.action.shard ] [esearch16] [domain_url_2014-01-03][2] received shard failed for [domain_url_2014-01-03][2], node[BKCZOztRRP6FXVKJSkT_oA], [R], s[STARTED], i
ndexUUID [OkUadV5JSJGI2B9dKwgMLw], reason [engine failure, message [OutOfMemoryError[Java heap space]]]
[2014-04-09 15:12:13,529][WARN ][cluster.action.shard ] [esearch16] [domain_url_2014-01-01][2] received shard failed for [domain_url_2014-01-01][2], node[BKCZOztRRP6FXVKJSkT_oA], [R], s[STARTED], i
ndexUUID [jDcZyjUrSW6eD3_TH5v0_Q], reason [engine failure, message [OutOfMemoryError[Java heap space]]]
[2014-04-09 15:39:24,021][WARN ][cluster.action.shard ] [esearch16] [domain_url_2014-01-03][1] received shard failed for [domain_url_2014-01-03][1], node[4ft2nd1lRE-BdvL2iYGIkg], relocating [BKCZOz
tRRP6FXVKJSkT_oA], [R], s[INITIALIZING], indexUUID [OkUadV5JSJGI2B9dKwgMLw], reason [Failed to start shard, message [RecoveryFailedException[[domain_url_2014-01-03][1]: Recovery failed from [esearch15]
[EkR2xgpURrunkxrRnpkzYQ][esearch15.tlys.us][inet[ip-10-185-171-146.ec2.internal/10.185.171.146:9300]] into [esearch14][4ft2nd1lRE-BdvL2iYGIkg][esearch14.tlys.us][inet[ip-10-184-39-23.ec2.internal/10.18
4.39.23:9300]]]; nested: RemoteTransportException[[esearch15][inet[/10.185.171.146:9300]][index/shard/recovery/startRecovery]]; nested: RecoveryEngineException[[domain_url_2014-01-03][1] Phase[2] Execu
tion failed]; nested: RemoteTransportException[[esearch14][inet[/10.184.39.23:9300]][index/shard/recovery/prepareTranslog]]; nested: OutOfMemoryError[Java heap space]; ]]
[2014-04-09 15:42:51,176][WARN ][cluster.action.shard ] [esearch16] [domain_url_2014-01-06][1] received shard failed for [domain_url_2014-01-06][1], node[4ft2nd1lRE-BdvL2iYGIkg], [R], s[STARTED], i
ndexUUID [51jdwEMrTGKtTpA90ZjXiQ], reason [engine failure, message [OutOfMemoryError[Java heap space]]]
[2014-04-09 15:54:42,711][DEBUG][action.admin.cluster.stats] [esearch16] failed to execute on node [4ft2nd1lRE-BdvL2iYGIkg]
org.elasticsearch.transport.RemoteTransportException: [esearch14][inet[/10.184.39.23:9300]][cluster/stats/n]
Caused by: org.elasticsearch.index.engine.EngineClosedException: [domain_url_2014-01-01][1] CurrentState[CLOSED]
at org.elasticsearch.index.engine.internal.InternalEngine.ensureOpen(InternalEngine.java:913)
at org.elasticsearch.index.engine.internal.InternalEngine.segmentsStats(InternalEngine.java:1130)
at org.elasticsearch.index.shard.service.InternalIndexShard.segmentStats(InternalIndexShard.java:532)
at org.elasticsearch.action.admin.indices.stats.CommonStats.(CommonStats.java:161)
at org.elasticsearch.action.admin.indices.stats.ShardStats.(ShardStats.java:49)
at org.elasticsearch.action.admin.cluster.stats.TransportClusterStatsAction.nodeOperation(TransportClusterStatsAction.java:130)
at org.elasticsearch.action.admin.cluster.stats.TransportClusterStatsAction.nodeOperation(TransportClusterStatsAction.java:54)
at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$NodeTransportHandler.messageReceived(TransportNodesOperationAction.java:281)
at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$NodeTransportHandler.messageReceived(TransportNodesOperationAction.java:272)
at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:270)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: java.lang.OutOfMemoryError: Java heap space
at org.apache.lucene.util.fst.BytesStore.(BytesStore.java:62)
at org.apache.lucene.util.fst.FST.(FST.java:366)
at org.apache.lucene.util.fst.FST.(FST.java:301)
at org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader.(BlockTreeTermsReader.java:481)
at org.apache.lucene.codecs.BlockTreeTermsReader.(BlockTreeTermsReader.java:175)
at org.apache.lucene.codecs.lucene41.Lucene41PostingsFormat.fieldsProducer(Lucene41PostingsFormat.java:437)
at org.elasticsearch.index.codec.postingsformat.BloomFilterPostingsFormat$BloomFilteredFieldsProducer.(BloomFilterPostingsFormat.java:131)
at org.elasticsearch.index.codec.postingsformat.BloomFilterPostingsFormat.fieldsProducer(BloomFilterPostingsFormat.java:102)
at org.elasticsearch.index.codec.postingsformat.Elasticsearch090PostingsFormat.fieldsProducer(Elasticsearch090PostingsFormat.java:79)
at org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsReader.(PerFieldPostingsFormat.java:195)
at org.apache.lucene.codecs.perfield.PerFieldPostingsFormat.fieldsProducer(PerFieldPostingsFormat.java:244)
at org.apache.lucene.index.SegmentCoreReaders.(SegmentCoreReaders.java:115)
at org.apache.lucene.index.SegmentReader.(SegmentReader.java:95)
at org.apache.lucene.index.ReadersAndUpdates.getReader(ReadersAndUpdates.java:141)
at org.apache.lucene.index.ReadersAndUpdates.getReadOnlyClone(ReadersAndUpdates.java:235)
at org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:100)
at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:382)
at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:111)
at org.apache.lucene.search.XSearcherManager.(XSearcherManager.java:94)
at org.elasticsearch.index.engine.internal.InternalEngine.buildSearchManager(InternalEngine.java:1462)
at org.elasticsearch.index.engine.internal.InternalEngine.flush(InternalEngine.java:801)
at org.elasticsearch.index.engine.internal.InternalEngine.updateIndexingBufferSize(InternalEngine.java:223)
at org.elasticsearch.indices.memory.IndexingMemoryController$ShardsIndicesStatusChecker.run(IndexingMemoryController.java:201)
at org.elasticsearch.threadpool.ThreadPool$LoggingRunnable.run(ThreadPool.java:437)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
... 3 more
[2014-04-09 15:54:51,827][WARN ][cluster.action.shard ] [esearch16] [domain_url_2014-01-01][1] received shard failed for [domain_url_2014-01-01][1], node[4ft2nd1lRE-BdvL2iYGIkg], [R], s[STARTED], indexUUID [jDcZyjUrSW6eD3_TH5v0_Q], reason [engine failure, message [OutOfMemoryError[Java heap space]]]

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/d2d66060-d0ac-49bf-b9ab-f4157ac3a4d5%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/d2d66060-d0ac-49bf-b9ab-f4157ac3a4d5%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Jesse M. Davis
jesse.michael.davis@gmail.com
206.226.9575

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CANekx7EiG_obujMv5cGtcX258bFgV%3Dw6L-eR2OmNEduNgLTKbA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

I've also attached the output from running /_nodes/stats as well.

On Wed, Apr 9, 2014 at 5:31 PM, Jesse Davis
jesse.michael.davis@gmail.comwrote:

As well, the uncommented sections of our elasticsearch.yml:

bootstrap.mlockall: true
gateway.type: local
gateway.recover_after_nodes: 4
gateway.recover_after_time: 5m
gateway.expected_nodes: 4
discovery.zen.minimum_master_nodes: 2
discovery.zen.ping.timeout: 20s
discovery.zen.ping.multicast.enabled: false

index.search.slowlog.threshold.query.warn: 10s
index.search.slowlog.threshold.query.info: 5s
index.search.slowlog.threshold.query.debug: 2s
index.search.slowlog.threshold.query.trace: 500ms

index.indexing.slowlog.threshold.index.warn: 10s
index.indexing.slowlog.threshold.index.info: 5s
index.indexing.slowlog.threshold.index.debug: 2s
index.indexing.slowlog.threshold.index.trace: 500ms

threadpool:
bulk:
type: fixed
min: 1
size: 30
wait_time: 30s
queue_size: -1
index:
type: fixed
min: 1
size: 30
wait_time: 30s
queue_size: -1

discovery.zen.fd.ping_interval: 10s
discovery.zen.fd.ping_timeout: 60s
discovery.zen.fd.ping_retries: 10

index.translog.flush_threshold_ops: 20000
index.translog.flush_threshold_size: 400mb
index.translog.flush_threshold_period: 60m

On Wednesday, April 9, 2014 5:02:11 PM UTC-4, Jesse Davis wrote:

We're attempting to create a new Elasticsearch cluster for indexing URLs, but have run into a memory leak when turning replication on for our indices.

The current setup is: 5 x m2.2xlarge, 4 TB mounted on EBS per node (not Provisioned IOPs).

We create one index per day, and will keep the past 90 days around for searching. We have been been performing bulk inserts with routing enabled, 1 day at a time, and have been successful in loading all 90 days. This ended up being approximately 313 million documents. I had inserted with the number of replicas per index set to 0 to increase our bulk insertion rate.
I then started changing the number of replicas per index to 1, one index at a time. I was able to successfully create the replicas for about 70 of the shards (i.e. about 65 or 70 days), but then ran out of heap space.

We are planning to bulk insert about 2-4 millions records per day in 10 minute intervals, so I would appreciate any advice on the validity of our configuration so far. In particular, we would like to know if there's any known memory leaks with shard replication or bulk inserts.

Our configuration:

Ubuntu 12.04 LTS
Java 7 u51 (I am aware of Redirecting to Google Groups and am doing a rolling restart of the cluster as we speak to move to Java 7 u25).
Marvel was installed on each node, but in order to simplify our setup, I will be removing it during the aforementioned cluster restart.

Elasticsearch 1.0.0

"version" : {
"number" : "1.0.0",
"build_hash" : "a46900e9c72c0a623d71b54016357d5f94c8ea32",
"build_timestamp" : "2014-02-12T16:18:34Z",
"build_snapshot" : false,
"lucene_version" : "4.6"
},

Settings applied for our bulk insert:

{
"index" : {
"merge.policy.max_merge_at_once" : 4,
"merge.policy.segments_per_tier" : 20,
"refresh_interval" : "-1" # I will be setting this back to 1s when our backfill/replicas are done
}
}

{
"transient" : {
"index.merge.policy.merge_factor" : 30,
"threadpool.bulk.queue_size" : -1,
"index.merge.scheduler.max_thread_count" : 5
}
}

Our Java configuration variables (those that are different from the default /etc/default/elasticsearch in the .deb):

JAVA_HOME=/usr/lib/jvm/java-1.7.0_25-oracle (this was Oracle's Java 7 u51, being backed down during the restart)
ES_HEAP_SIZE=18g
MAX_OPEN_FILES=256000

From a running instance:

/usr/lib/jvm/java-1.7.0_25-oracle/bin/java -Xms18g -Xmx18g -Xss256k -Djava.awt.headless=true -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -Delasticsearch -Des.pidfile=/var/run/elasticsearch.pid -Des.path.home=/usr/share/elasticsearch -cp :/usr/share/elasticsearch/lib/elasticsearch-1.0.0.jar:/usr/share/elasticsearch/lib/:/usr/share/elasticsearch/lib/sigar/ -Des.default.config=/etc/elasticsearch/elasticsearch.yml -Des.default.path.home=/usr/share/elasticsearch -Des.default.path.logs=/var/log/elasticsearch -Des.default.path.data=/var/lib/elasticsearch -Des.default.path.work=/tmp/elasticsearch -Des.default.path.conf=/etc/elasticsearch org.elasticsearch.bootstrap.Elasticsearch

The log message I saw during the OutOfMemoryError:

[2014-04-09 14:17:28,393][WARN ][cluster.action.shard ] [esearch16] [.marvel-2014.04.09][0] received shard failed for [.marvel-2014.04.09][0], node[SK5okikgSWSdbrQdZWET8g], [R], s[INITIALIZING], in
dexUUID [K4IB1Px3RoOqcjPbta-fKw], reason [Failed to start shard, message [RecoveryFailedException[[.marvel-2014.04.09][0]: Recovery failed from [esearch16][EbYQ9HNzQtexkEZ1PgwpnQ][esearch16.tlys.us][in
et[/10.145.167.184:9300]] into [esearch13][SK5okikgSWSdbrQdZWET8g][esearch13.tlys.us][inet[ip-10-185-195-69.ec2.internal/10.185.195.69:9300]]]; nested: RemoteTransportException[[esearch16][inet[/10.145
.167.184:9300]][index/shard/recovery/startRecovery]]; nested: RecoveryEngineException[[.marvel-2014.04.09][0] Phase[2] Execution failed]; nested: RemoteTransportException[[esearch13][inet[/10.185.195.6
9:9300]][index/shard/recovery/prepareTranslog]]; nested: EngineCreationFailureException[[.marvel-2014.04.09][0] failed to create engine]; nested: LockObtainFailedException[Lock obtain timed out: Native
FSLock@/ebsmnt/data/elasticsearch/search-prod/nodes/0/indices/.marvel-2014.04.09/0/index/write.lock]; ]]
[2014-04-09 14:24:47,111][WARN ][cluster.action.shard ] [esearch16] [domain_url_2014-01-03][4] received shard failed for [domain_url_2014-01-03][4], node[SK5okikgSWSdbrQdZWET8g], [R], s[STARTED], i
ndexUUID [OkUadV5JSJGI2B9dKwgMLw], reason [engine failure, message [OutOfMemoryError[Java heap space]]]
[2014-04-09 14:26:06,104][WARN ][cluster.action.shard ] [esearch16] [.marvel-2014.04.09][0] received shard failed for [.marvel-2014.04.09][0], node[SK5okikgSWSdbrQdZWET8g], [R], s[INITIALIZING], in
dexUUID [K4IB1Px3RoOqcjPbta-fKw], reason [Failed to start shard, message [RecoveryFailedException[[.marvel-2014.04.09][0]: Recovery failed from [esearch16][EbYQ9HNzQtexkEZ1PgwpnQ][esearch16.tlys.us][in
et[/10.145.167.184:9300]] into [esearch13][SK5okikgSWSdbrQdZWET8g][esearch13.tlys.us][inet[ip-10-185-195-69.ec2.internal/10.185.195.69:9300]]]; nested: RemoteTransportException[[esearch16][inet[/10.145
.167.184:9300]][index/shard/recovery/startRecovery]]; nested: RecoveryEngineException[[.marvel-2014.04.09][0] Phase[2] Execution failed]; nested: RemoteTransportException[[esearch13][inet[/10.185.195.6
9:9300]][index/shard/recovery/prepareTranslog]]; nested: EngineCreationFailureException[[.marvel-2014.04.09][0] failed to create engine]; nested: LockObtainFailedException[Lock obtain timed out: Native
FSLock@/ebsmnt/data/elasticsearch/search-prod/nodes/0/indices/.marvel-2014.04.09/0/index/write.lock]; ]]
[2014-04-09 14:26:48,562][INFO ][cluster.metadata ] [esearch16] updating number_of_replicas to [0] for indices [.marvel-2014.04.09]
[2014-04-09 14:27:27,235][INFO ][cluster.metadata ] [esearch16] updating number_of_replicas to [0] for indices [.marvel-2014.04.09]
[2014-04-09 14:37:01,359][INFO ][cluster.metadata ] [esearch16] [.marvel-2014.04.09] update_mapping [shard_event] (dynamic)
[2014-04-09 14:37:01,531][INFO ][cluster.metadata ] [esearch16] [.marvel-2014.04.09] update_mapping [routing_event] (dynamic)
[2014-04-09 14:40:51,469][WARN ][cluster.action.shard ] [esearch16] [domain_url_2014-01-01][2] received shard failed for [domain_url_2014-01-01][2], node[BKCZOztRRP6FXVKJSkT_oA], [R], s[STARTED], i
ndexUUID [jDcZyjUrSW6eD3_TH5v0_Q], reason [engine failure, message [OutOfMemoryError[Java heap space]]]
[2014-04-09 14:41:00,353][WARN ][cluster.action.shard ] [esearch16] [domain_url_2014-03-11][2] received shard failed for [domain_url_2014-03-11][2], node[SK5okikgSWSdbrQdZWET8g], [R], s[STARTED], i
ndexUUID [HuQzTDCmTMeS3He3DumnOg], reason [engine failure, message [OutOfMemoryError[Java heap space]]]
[2014-04-09 15:04:32,504][WARN ][cluster.action.shard ] [esearch16] [domain_url_2014-01-03][2] received shard failed for [domain_url_2014-01-03][2], node[BKCZOztRRP6FXVKJSkT_oA], [R], s[STARTED], i
ndexUUID [OkUadV5JSJGI2B9dKwgMLw], reason [engine failure, message [OutOfMemoryError[Java heap space]]]
[2014-04-09 15:12:13,529][WARN ][cluster.action.shard ] [esearch16] [domain_url_2014-01-01][2] received shard failed for [domain_url_2014-01-01][2], node[BKCZOztRRP6FXVKJSkT_oA], [R], s[STARTED], i
ndexUUID [jDcZyjUrSW6eD3_TH5v0_Q], reason [engine failure, message [OutOfMemoryError[Java heap space]]]
[2014-04-09 15:39:24,021][WARN ][cluster.action.shard ] [esearch16] [domain_url_2014-01-03][1] received shard failed for [domain_url_2014-01-03][1], node[4ft2nd1lRE-BdvL2iYGIkg], relocating [BKCZOz
tRRP6FXVKJSkT_oA], [R], s[INITIALIZING], indexUUID [OkUadV5JSJGI2B9dKwgMLw], reason [Failed to start shard, message [RecoveryFailedException[[domain_url_2014-01-03][1]: Recovery failed from [esearch15]
[EkR2xgpURrunkxrRnpkzYQ][esearch15.tlys.us][inet[ip-10-185-171-146.ec2.internal/10.185.171.146:9300]] into [esearch14][4ft2nd1lRE-BdvL2iYGIkg][esearch14.tlys.us][inet[ip-10-184-39-23.ec2.internal/10.18
4.39.23:9300]]]; nested: RemoteTransportException[[esearch15][inet[/10.185.171.146:9300]][index/shard/recovery/startRecovery]]; nested: RecoveryEngineException[[domain_url_2014-01-03][1] Phase[2] Execu
tion failed]; nested: RemoteTransportException[[esearch14][inet[/10.184.39.23:9300]][index/shard/recovery/prepareTranslog]]; nested: OutOfMemoryError[Java heap space]; ]]
[2014-04-09 15:42:51,176][WARN ][cluster.action.shard ] [esearch16] [domain_url_2014-01-06][1] received shard failed for [domain_url_2014-01-06][1], node[4ft2nd1lRE-BdvL2iYGIkg], [R], s[STARTED], i
ndexUUID [51jdwEMrTGKtTpA90ZjXiQ], reason [engine failure, message [OutOfMemoryError[Java heap space]]]
[2014-04-09 15:54:42,711][DEBUG][action.admin.cluster.stats] [esearch16] failed to execute on node [4ft2nd1lRE-BdvL2iYGIkg]
org.elasticsearch.transport.RemoteTransportException: [esearch14][inet[/10.184.39.23:9300]][cluster/stats/n]
Caused by: org.elasticsearch.index.engine.EngineClosedException: [domain_url_2014-01-01][1] CurrentState[CLOSED]
at org.elasticsearch.index.engine.internal.InternalEngine.ensureOpen(InternalEngine.java:913)
at org.elasticsearch.index.engine.internal.InternalEngine.segmentsStats(InternalEngine.java:1130)
at org.elasticsearch.index.shard.service.InternalIndexShard.segmentStats(InternalIndexShard.java:532)
at org.elasticsearch.action.admin.indices.stats.CommonStats.(CommonStats.java:161)
at org.elasticsearch.action.admin.indices.stats.ShardStats.(ShardStats.java:49)
at org.elasticsearch.action.admin.cluster.stats.TransportClusterStatsAction.nodeOperation(TransportClusterStatsAction.java:130)
at org.elasticsearch.action.admin.cluster.stats.TransportClusterStatsAction.nodeOperation(TransportClusterStatsAction.java:54)
at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$NodeTransportHandler.messageReceived(TransportNodesOperationAction.java:281)
at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$NodeTransportHandler.messageReceived(TransportNodesOperationAction.java:272)
at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:270)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: java.lang.OutOfMemoryError: Java heap space
at org.apache.lucene.util.fst.BytesStore.(BytesStore.java:62)
at org.apache.lucene.util.fst.FST.(FST.java:366)
at org.apache.lucene.util.fst.FST.(FST.java:301)
at org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader.(BlockTreeTermsReader.java:481)
at org.apache.lucene.codecs.BlockTreeTermsReader.(BlockTreeTermsReader.java:175)
at org.apache.lucene.codecs.lucene41.Lucene41PostingsFormat.fieldsProducer(Lucene41PostingsFormat.java:437)
at org.elasticsearch.index.codec.postingsformat.BloomFilterPostingsFormat$BloomFilteredFieldsProducer.(BloomFilterPostingsFormat.java:131)
at org.elasticsearch.index.codec.postingsformat.BloomFilterPostingsFormat.fieldsProducer(BloomFilterPostingsFormat.java:102)
at org.elasticsearch.index.codec.postingsformat.Elasticsearch090PostingsFormat.fieldsProducer(Elasticsearch090PostingsFormat.java:79)
at org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsReader.(PerFieldPostingsFormat.java:195)
at org.apache.lucene.codecs.perfield.PerFieldPostingsFormat.fieldsProducer(PerFieldPostingsFormat.java:244)
at org.apache.lucene.index.SegmentCoreReaders.(SegmentCoreReaders.java:115)
at org.apache.lucene.index.SegmentReader.(SegmentReader.java:95)
at org.apache.lucene.index.ReadersAndUpdates.getReader(ReadersAndUpdates.java:141)
at org.apache.lucene.index.ReadersAndUpdates.getReadOnlyClone(ReadersAndUpdates.java:235)
at org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:100)
at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:382)
at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:111)
at org.apache.lucene.search.XSearcherManager.(XSearcherManager.java:94)
at org.elasticsearch.index.engine.internal.InternalEngine.buildSearchManager(InternalEngine.java:1462)
at org.elasticsearch.index.engine.internal.InternalEngine.flush(InternalEngine.java:801)
at org.elasticsearch.index.engine.internal.InternalEngine.updateIndexingBufferSize(InternalEngine.java:223)
at org.elasticsearch.indices.memory.IndexingMemoryController$ShardsIndicesStatusChecker.run(IndexingMemoryController.java:201)
at org.elasticsearch.threadpool.ThreadPool$LoggingRunnable.run(ThreadPool.java:437)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
... 3 more
[2014-04-09 15:54:51,827][WARN ][cluster.action.shard ] [esearch16] [domain_url_2014-01-01][1] received shard failed for [domain_url_2014-01-01][1], node[4ft2nd1lRE-BdvL2iYGIkg], [R], s[STARTED], indexUUID [jDcZyjUrSW6eD3_TH5v0_Q], reason [engine failure, message [OutOfMemoryError[Java heap space]]]

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/fd602559-ac92-4ef2-b602-70fc409e4b58%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/fd602559-ac92-4ef2-b602-70fc409e4b58%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
Jesse M. Davis
jesse.michael.davis@gmail.com
206.226.9575

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CANekx7EjWPpCUEK0bL4ZqF8EeV3aV7X%3D0dpOKnjvQ%3DYWPALX1Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Hi Jesse,

It looks like your nodes don't have enough memory to load the lucene data
structures into memory. When you added the replicas you doubled the number
of lucene indices and thus doubled the memory requirements. To reduce
memory you can consider using less shards per index (lucene structured will
be shared), this might cause indexing to be slower - depending on your
hardware and what you shard settings are now.

I also wonder why tweaked the merge policy settings and threads pools. The
defaults are good. Not merging correctly can also increase memory usage.

On Thursday, April 10, 2014 4:48:18 PM UTC+2, Jesse Davis wrote:

I've also attached the output from running /_nodes/stats as well.

On Wed, Apr 9, 2014 at 5:31 PM, Jesse Davis <jesse.michael.davis@gmail.com

wrote:

As well, the uncommented sections of our elasticsearch.yml:

bootstrap.mlockall: true
gateway.type: local
gateway.recover_after_nodes: 4
gateway.recover_after_time: 5m
gateway.expected_nodes: 4
discovery.zen.minimum_master_nodes: 2
discovery.zen.ping.timeout: 20s
discovery.zen.ping.multicast.enabled: false

index.search.slowlog.threshold.query.warn: 10s
index.search.slowlog.threshold.query.info: 5s
index.search.slowlog.threshold.query.debug: 2s
index.search.slowlog.threshold.query.trace: 500ms

index.indexing.slowlog.threshold.index.warn: 10s
index.indexing.slowlog.threshold.index.info: 5s
index.indexing.slowlog.threshold.index.debug: 2s
index.indexing.slowlog.threshold.index.trace: 500ms

threadpool:
bulk:
type: fixed
min: 1
size: 30
wait_time: 30s
queue_size: -1
index:
type: fixed
min: 1
size: 30
wait_time: 30s
queue_size: -1

discovery.zen.fd.ping_interval: 10s
discovery.zen.fd.ping_timeout: 60s
discovery.zen.fd.ping_retries: 10

index.translog.flush_threshold_ops: 20000
index.translog.flush_threshold_size: 400mb
index.translog.flush_threshold_period: 60m

On Wednesday, April 9, 2014 5:02:11 PM UTC-4, Jesse Davis wrote:

We're attempting to create a new Elasticsearch cluster for indexing URLs, but have run into a memory leak when turning replication on for our indices.

The current setup is: 5 x m2.2xlarge, 4 TB mounted on EBS per node (not Provisioned IOPs).

We create one index per day, and will keep the past 90 days around for searching. We have been been performing bulk inserts with routing enabled, 1 day at a time, and have been successful in loading all 90 days. This ended up being approximately 313 million documents. I had inserted with the number of replicas per index set to 0 to increase our bulk insertion rate.
I then started changing the number of replicas per index to 1, one index at a time. I was able to successfully create the replicas for about 70 of the shards (i.e. about 65 or 70 days), but then ran out of heap space.

We are planning to bulk insert about 2-4 millions records per day in 10 minute intervals, so I would appreciate any advice on the validity of our configuration so far. In particular, we would like to know if there's any known memory leaks with shard replication or bulk inserts.

Our configuration:

Ubuntu 12.04 LTS
Java 7 u51 (I am aware of Redirecting to Google Groups and am doing a rolling restart of the cluster as we speak to move to Java 7 u25).
Marvel was installed on each node, but in order to simplify our setup, I will be removing it during the aforementioned cluster restart.

Elasticsearch 1.0.0

"version" : {
"number" : "1.0.0",
"build_hash" : "a46900e9c72c0a623d71b54016357d5f94c8ea32",
"build_timestamp" : "2014-02-12T16:18:34Z",
"build_snapshot" : false,
"lucene_version" : "4.6"
},

Settings applied for our bulk insert:

{
"index" : {
"merge.policy.max_merge_at_once" : 4,
"merge.policy.segments_per_tier" : 20,
"refresh_interval" : "-1" # I will be setting this back to 1s when our backfill/replicas are done
}
}

{
"transient" : {
"index.merge.policy.merge_factor" : 30,
"threadpool.bulk.queue_size" : -1,
"index.merge.scheduler.max_thread_count" : 5
}
}

Our Java configuration variables (those that are different from the default /etc/default/elasticsearch in the .deb):

JAVA_HOME=/usr/lib/jvm/java-1.7.0_25-oracle (this was Oracle's Java 7 u51, being backed down during the restart)
ES_HEAP_SIZE=18g
MAX_OPEN_FILES=256000

From a running instance:

/usr/lib/jvm/java-1.7.0_25-oracle/bin/java -Xms18g -Xmx18g -Xss256k -Djava.awt.headless=true -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -Delasticsearch -Des.pidfile=/var/run/elasticsearch.pid -Des.path.home=/usr/share/elasticsearch -cp :/usr/share/elasticsearch/lib/elasticsearch-1.0.0.jar:/usr/share/elasticsearch/lib/:/usr/share/elasticsearch/lib/sigar/ -Des.default.config=/etc/elasticsearch/elasticsearch.yml -Des.default.path.home=/usr/share/elasticsearch -Des.default.path.logs=/var/log/elasticsearch -Des.default.path.data=/var/lib/elasticsearch -Des.default.path.work=/tmp/elasticsearch -Des.default.path.conf=/etc/elasticsearch org.elasticsearch.bootstrap.Elasticsearch

The log message I saw during the OutOfMemoryError:

[2014-04-09 14:17:28,393][WARN ][cluster.action.shard ] [esearch16] [.marvel-2014.04.09][0] received shard failed for [.marvel-2014.04.09][0], node[SK5okikgSWSdbrQdZWET8g], [R], s[INITIALIZING], in
dexUUID [K4IB1Px3RoOqcjPbta-fKw], reason [Failed to start shard, message [RecoveryFailedException[[.marvel-2014.04.09][0]: Recovery failed from [esearch16][EbYQ9HNzQtexkEZ1PgwpnQ][esearch16.tlys.us][in
et[/10.145.167.184:9300]] into [esearch13][SK5okikgSWSdbrQdZWET8g][esearch13.tlys.us][inet[ip-10-185-195-69.ec2.internal/10.185.195.69:9300]]]; nested: RemoteTransportException[[esearch16][inet[/10.145
.167.184:9300]][index/shard/recovery/startRecovery]]; nested: RecoveryEngineException[[.marvel-2014.04.09][0] Phase[2] Execution failed]; nested: RemoteTransportException[[esearch13][inet[/10.185.195.6
9:9300]][index/shard/recovery/prepareTranslog]]; nested: EngineCreationFailureException[[.marvel-2014.04.09][0] failed to create engine]; nested: LockObtainFailedException[Lock obtain timed out: Native
FSLock@/ebsmnt/data/elasticsearch/search-prod/nodes/0/indices/.marvel-2014.04.09/0/index/write.lock]; ]]
[2014-04-09 14:24:47,111][WARN ][cluster.action.shard ] [esearch16] [domain_url_2014-01-03][4] received shard failed for [domain_url_2014-01-03][4], node[SK5okikgSWSdbrQdZWET8g], [R], s[STARTED], i
ndexUUID [OkUadV5JSJGI2B9dKwgMLw], reason [engine failure, message [OutOfMemoryError[Java heap space]]]
[2014-04-09 14:26:06,104][WARN ][cluster.action.shard ] [esearch16] [.marvel-2014.04.09][0] received shard failed for [.marvel-2014.04.09][0], node[SK5okikgSWSdbrQdZWET8g], [R], s[INITIALIZING], in
dexUUID [K4IB1Px3RoOqcjPbta-fKw], reason [Failed to start shard, message [RecoveryFailedException[[.marvel-2014.04.09][0]: Recovery failed from [esearch16][EbYQ9HNzQtexkEZ1PgwpnQ][esearch16.tlys.us][in
et[/10.145.167.184:9300]] into [esearch13][SK5okikgSWSdbrQdZWET8g][esearch13.tlys.us][inet[ip-10-185-195-69.ec2.internal/10.185.195.69:9300]]]; nested: RemoteTransportException[[esearch16][inet[/10.145
.167.184:9300]][index/shard/recovery/startRecovery]]; nested: RecoveryEngineException[[.marvel-2014.04.09][0] Phase[2] Execution failed]; nested: RemoteTransportException[[esearch13][inet[/10.185.195.6
9:9300]][index/shard/recovery/prepareTranslog]]; nested: EngineCreationFailureException[[.marvel-2014.04.09][0] failed to create engine]; nested: LockObtainFailedException[Lock obtain timed out: Native
FSLock@/ebsmnt/data/elasticsearch/search-prod/nodes/0/indices/.marvel-2014.04.09/0/index/write.lock]; ]]
[2014-04-09 14:26:48,562][INFO ][cluster.metadata ] [esearch16] updating number_of_replicas to [0] for indices [.marvel-2014.04.09]
[2014-04-09 14:27:27,235][INFO ][cluster.metadata ] [esearch16] updating number_of_replicas to [0] for indices [.marvel-2014.04.09]
[2014-04-09 14:37:01,359][INFO ][cluster.metadata ] [esearch16] [.marvel-2014.04.09] update_mapping [shard_event] (dynamic)
[2014-04-09 14:37:01,531][INFO ][cluster.metadata ] [esearch16] [.marvel-2014.04.09] update_mapping [routing_event] (dynamic)
[2014-04-09 14:40:51,469][WARN ][cluster.action.shard ] [esearch16] [domain_url_2014-01-01][2] received shard failed for [domain_url_2014-01-01][2], node[BKCZOztRRP6FXVKJSkT_oA], [R], s[STARTED], i
ndexUUID [jDcZyjUrSW6eD3_TH5v0_Q], reason [engine failure, message [OutOfMemoryError[Java heap space]]]
[2014-04-09 14:41:00,353][WARN ][cluster.action.shard ] [esearch16] [domain_url_2014-03-11][2] received shard failed for [domain_url_2014-03-11][2], node[SK5okikgSWSdbrQdZWET8g], [R], s[STARTED], i
ndexUUID [HuQzTDCmTMeS3He3DumnOg], reason [engine failure, message [OutOfMemoryError[Java heap space]]]
[2014-04-09 15:04:32,504][WARN ][cluster.action.shard ] [esearch16] [domain_url_2014-01-03][2] received shard failed for [domain_url_2014-01-03][2], node[BKCZOztRRP6FXVKJSkT_oA], [R], s[STARTED], i
ndexUUID [OkUadV5JSJGI2B9dKwgMLw], reason [engine failure, message [OutOfMemoryError[Java heap space]]]
[2014-04-09 15:12:13,529][WARN ][cluster.action.shard ] [esearch16] [domain_url_2014-01-01][2] received shard failed for [domain_url_2014-01-01][2], node[BKCZOztRRP6FXVKJSkT_oA], [R], s[STARTED], i
ndexUUID [jDcZyjUrSW6eD3_TH5v0_Q], reason [engine failure, message [OutOfMemoryError[Java heap space]]]
[2014-04-09 15:39:24,021][WARN ][cluster.action.shard ] [esearch16] [domain_url_2014-01-03][1] received shard failed for [domain_url_2014-01-03][1], node[4ft2nd1lRE-BdvL2iYGIkg], relocating [BKCZOz
tRRP6FXVKJSkT_oA], [R], s[INITIALIZING], indexUUID [OkUadV5JSJGI2B9dKwgMLw], reason [Failed to start shard, message [RecoveryFailedException[[domain_url_2014-01-03][1]: Recovery failed from [esearch15]
[EkR2xgpURrunkxrRnpkzYQ][esearch15.tlys.us][inet[ip-10-185-171-146.ec2.internal/10.185.171.146:9300]] into [esearch14][4ft2nd1lRE-BdvL2iYGIkg][esearch14.tlys.us][inet[ip-10-184-39-23.ec2.internal/10.18
4.39.23:9300]]]; nested: RemoteTransportException[[esearch15][inet[/10.185.171.146:9300]][index/shard/recovery/startRecovery]]; nested: RecoveryEngineException[[domain_url_2014-01-03][1] Phase[2] Execu
tion failed]; nested: RemoteTransportException[[esearch14][inet[/10.184.39.23:9300]][index/shard/recovery/prepareTranslog]]; nested: OutOfMemoryError[Java heap space]; ]]
[2014-04-09 15:42:51,176][WARN ][cluster.action.shard ] [esearch16] [domain_url_2014-01-06][1] received shard failed for [domain_url_2014-01-06][1], node[4ft2nd1lRE-BdvL2iYGIkg], [R], s[STARTED], i
ndexUUID [51jdwEMrTGKtTpA90ZjXiQ], reason [engine failure, message [OutOfMemoryError[Java heap space]]]
[2014-04-09 15:54:42,711][DEBUG][action.admin.cluster.stats] [esearch16] failed to execute on node [4ft2nd1lRE-BdvL2iYGIkg]
org.elasticsearch.transport.RemoteTransportException: [esearch14][inet[/10.184.39.23:9300]][cluster/stats/n]
Caused by: org.elasticsearch.index.engine.EngineClosedException: [domain_url_2014-01-01][1] CurrentState[CLOSED]
at org.elasticsearch.index.engine.internal.InternalEngine.ensureOpen(InternalEngine.java:913)
at org.elasticsearch.index.engine.internal.InternalEngine.segmentsStats(InternalEngine.java:1130)
at org.elasticsearch.index.shard.service.InternalIndexShard.segmentStats(InternalIndexShard.java:532)
at org.elasticsearch.action.admin.indices.stats.CommonStats.(CommonStats.java:161)
at org.elasticsearch.action.admin.indices.stats.ShardStats.(ShardStats.java:49)
at org.elasticsearch.action.admin.cluster.stats.TransportClusterStatsAction.nodeOperation(TransportClusterStatsAction.java:130)
at org.elasticsearch.action.admin.cluster.stats.TransportClusterStatsAction.nodeOperation(TransportClusterStatsAction.java:54)
at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$NodeTransportHandler.messageReceived(TransportNodesOperationAction.java:281)
at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$NodeTransportHandler.messageReceived(TransportNodesOperationAction.java:272)
at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:270)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: java.lang.OutOfMemoryError: Java heap space
at org.apache.lucene.util.fst.BytesStore.(BytesStore.java:62)
at org.apache.lucene.util.fst.FST.(FST.java:366)
at org.apache.lucene.util.fst.FST.(FST.java:301)
at org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader.(BlockTreeTermsReader.java:481)
at org.apache.lucene.codecs.BlockTreeTermsReader.(BlockTreeTermsReader.java:175)
at org.apache.lucene.codecs.lucene41.Lucene41PostingsFormat.fieldsProducer(Lucene41PostingsFormat.java:437)
at org.elasticsearch.index.codec.postingsformat.BloomFilterPostingsFormat$BloomFilteredFieldsProducer.(BloomFilterPostingsFormat.java:131)
at org.elasticsearch.index.codec.postingsformat.BloomFilterPostingsFormat.fieldsProducer(BloomFilterPostingsFormat.java:102)
at org.elasticsearch.index.codec.postingsformat.Elasticsearch090PostingsFormat.fieldsProducer(Elasticsearch090PostingsFormat.java:79)
at org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsReader.(PerFieldPostingsFormat.java:195)
at org.apache.lucene.codecs.perfield.PerFieldPostingsFormat.fieldsProducer(PerFieldPostingsFormat.java:244)
at org.apache.lucene.index.SegmentCoreReaders.(SegmentCoreReaders.java:115)
at org.apache.lucene.index.SegmentReader.(SegmentReader.java:95)
at org.apache.lucene.index.ReadersAndUpdates.getReader(ReadersAndUpdates.java:141)
at org.apache.lucene.index.ReadersAndUpdates.getReadOnlyClone(ReadersAndUpdates.java:235)
at org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:100)
at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:382)
at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:111)
at org.apache.lucene.search.XSearcherManager.(XSearcherManager.java:94)
at org.elasticsearch.index.engine.internal.InternalEngine.buildSearchManager(InternalEngine.java:1462)
at org.elasticsearch.index.engine.internal.InternalEngine.flush(InternalEngine.java:801)
at org.elasticsearch.index.engine.internal.InternalEngine.updateIndexingBufferSize(InternalEngine.java:223)
at org.elasticsearch.indices.memory.IndexingMemoryController$ShardsIndicesStatusChecker.run(IndexingMemoryController.java:201)
at org.elasticsearch.threadpool.ThreadPool$LoggingRunnable.run(ThreadPool.java:437)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
... 3 more
[2014-04-09 15:54:51,827][WARN ][cluster.action.shard ] [esearch16] [domain_url_2014-01-01][1] received shard failed for [domain_url_2014-01-01][1], node[4ft2nd1lRE-BdvL2iYGIkg], [R], s[STARTED], indexUUID [jDcZyjUrSW6eD3_TH5v0_Q], reason [engine failure, message [OutOfMemoryError[Java heap space]]]

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/fd602559-ac92-4ef2-b602-70fc409e4b58%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/fd602559-ac92-4ef2-b602-70fc409e4b58%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
Jesse M. Davis
jesse.michael.davis@gmail.com
206.226.9575

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/794af77f-ba17-4147-bd2a-120bde788c69%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.