We have one particularly large index in our cluster - it contains 10s of million…s of documents and has quite a lot of nesteds too. Prior to 1.4.0 Beta 1 (including 1.2.x and 1.3.x) the index re-initialised on a node with 8GB allocated to ElasticSearch (16GB+ available in OS). Since 1.4.0 Beta 1 (and still on 1.4.0) we're getting an OOM exception (startup log and exception stack below). At this point, the node ceases recovery (expected, I guess) and becomes unresponsive. All data nodes suffer the same fate and the entire cluster becomes unresponsive.
```
[2014-11-07 17:12:39,895][WARN ][common.jna ] unable to link C library. native methods (mlockall) will be disabled.
[2014-11-07 17:12:40,077][INFO ][node ] [dvlp_FRONTEND2] version[1.4.0], pid[9052], build[bc94bd8/2014-11-05T14:26:12Z]
[2014-11-07 17:12:40,077][INFO ][node ] [dvlp_FRONTEND2] initializing ...
[2014-11-07 17:12:40,129][INFO ][plugins ] [dvlp_FRONTEND2] loaded [cloud-aws], sites [bigdesk, head, inquisitor, kopf]
[2014-11-07 17:12:45,220][INFO ][node ] [dvlp_FRONTEND2] initialized
[2014-11-07 17:12:45,220][INFO ][node ] [dvlp_FRONTEND2] starting ...
[2014-11-07 17:12:45,438][INFO ][transport ] [dvlp_FRONTEND2] bound_address {inet[/0:0:0:0:0:0:0:0:50882]}, publish_address {inet[FRONTEND2/192.168.10.73:50882]}
[2014-11-07 17:12:45,452][INFO ][discovery ] [dvlp_FRONTEND2] dvlp/C2f-euXcRc-cEv3dnsBnXw
[2014-11-07 17:13:15,451][WARN ][discovery ] [dvlp_FRONTEND2] waited for 30s and no initial state was set by the discovery
[2014-11-07 17:13:15,468][INFO ][http ] [dvlp_FRONTEND2] bound_address {inet[/0:0:0:0:0:0:0:0:50881]}, publish_address {inet[frontend2/192.168.10.73:50881]}
[2014-11-07 17:13:15,468][INFO ][node ] [dvlp_FRONTEND2] started
[2014-11-07 17:13:48,552][INFO ][discovery.zen ] [dvlp_FRONTEND2] failed to send join request to master [[dvlp_FRONTEND2_coordinator][jwhGk5NyTx-E1HInKTLDkg][FRONTEND2][inet[/192.168.10.73:55591]]{datacentrename=site2, data=false, nodename=dvlp_FRONTEND2_coordinator, master=true}], reason [ElasticsearchTimeoutException[Timeout waiting for task.]]
[2014-11-07 17:14:51,597][INFO ][discovery.zen ] [dvlp_FRONTEND2] failed to send join request to master [[dvlp_FRONTEND2_coordinator][jwhGk5NyTx-E1HInKTLDkg][FRONTEND2][inet[/192.168.10.73:55591]]{datacentrename=site2, data=false, nodename=dvlp_FRONTEND2_coordinator, master=true}], reason [ElasticsearchTimeoutException[Timeout waiting for task.]]
[2014-11-07 17:15:54,633][INFO ][discovery.zen ] [dvlp_FRONTEND2] failed to send join request to master [[dvlp_FRONTEND2_coordinator][jwhGk5NyTx-E1HInKTLDkg][FRONTEND2][inet[/192.168.10.73:55591]]{datacentrename=site2, data=false, nodename=dvlp_FRONTEND2_coordinator, master=true}], reason [ElasticsearchTimeoutException[Timeout waiting for task.]]
[2014-11-07 17:16:57,647][INFO ][discovery.zen ] [dvlp_FRONTEND2] failed to send join request to master [[dvlp_FRONTEND2_coordinator][jwhGk5NyTx-E1HInKTLDkg][FRONTEND2][inet[/192.168.10.73:55591]]{datacentrename=site2, data=false, nodename=dvlp_FRONTEND2_coordinator, master=true}], reason [ElasticsearchTimeoutException[Timeout waiting for task.]]
[2014-11-07 17:18:00,664][INFO ][discovery.zen ] [dvlp_FRONTEND2] failed to send join request to master [[dvlp_FRONTEND2_coordinator][jwhGk5NyTx-E1HInKTLDkg][FRONTEND2][inet[/192.168.10.73:55591]]{datacentrename=site2, data=false, nodename=dvlp_FRONTEND2_coordinator, master=true}], reason [ElasticsearchTimeoutException[Timeout waiting for task.]]
[2014-11-07 17:19:03,675][INFO ][discovery.zen ] [dvlp_FRONTEND2] failed to send join request to master [[dvlp_FRONTEND2_coordinator][jwhGk5NyTx-E1HInKTLDkg][FRONTEND2][inet[/192.168.10.73:55591]]{datacentrename=site2, data=false, nodename=dvlp_FRONTEND2_coordinator, master=true}], reason [ElasticsearchTimeoutException[Timeout waiting for task.]]
[2014-11-07 17:20:06,684][INFO ][discovery.zen ] [dvlp_FRONTEND2] failed to send join request to master [[dvlp_FRONTEND2_coordinator][jwhGk5NyTx-E1HInKTLDkg][FRONTEND2][inet[/192.168.10.73:55591]]{datacentrename=site2, data=false, nodename=dvlp_FRONTEND2_coordinator, master=true}], reason [ElasticsearchTimeoutException[Timeout waiting for task.]]
[2014-11-07 17:20:36,950][INFO ][discovery.zen ] [dvlp_FRONTEND2] failed to send join request to master [[dvlp_FRONTEND2_coordinator][jwhGk5NyTx-E1HInKTLDkg][FRONTEND2][inet[/192.168.10.73:55591]]{datacentrename=site2, data=false, nodename=dvlp_FRONTEND2_coordinator, master=true}], reason [NodeDisconnectedException[[dvlp_FRONTEND2_coordinator][inet[/192.168.10.73:55591]][internal:discovery/zen/join] disconnected]]
[2014-11-07 17:20:41,171][WARN ][transport.netty ] [dvlp_FRONTEND2] Message not fully read (response) for [85] handler future(org.elasticsearch.transport.EmptyTransportResponseHandler@2060e2c8), error [true], resetting
[2014-11-07 17:20:41,171][INFO ][discovery.zen ] [dvlp_FRONTEND2] failed to send join request to master [[dvlp_FRONTEND1_coordinator][4y8Hh5kAQPK2Ie3gzc58Ww][FRONTEND1][inet[/192.168.10.70:55858]]{datacentrename=site1, data=false, nodename=dvlp_FRONTEND1_coordinator, master=true}], reason [RemoteTransportException[Failed to deserialize exception response from stream]; nested: TransportSerializationException[Failed to deserialize exception response from stream]; nested: StreamCorruptedException[unexpected end of block data]; ]
[2014-11-07 17:20:45,520][INFO ][discovery.zen ] [dvlp_FRONTEND2] failed to send join request to master [[dvlp_FRONTEND2_coordinator][-O87CxU3RRSTHZkuC985Yw][FRONTEND2][inet[/192.168.10.73:55591]]{datacentrename=site2, data=false, nodename=dvlp_FRONTEND2_coordinator, master=true}], reason [RemoteTransportException[[dvlp_FRONTEND2_coordinator][inet[/192.168.10.73:55591]][internal:discovery/zen/join]]; nested: ElasticsearchIllegalStateException[Node [[dvlp_FRONTEND2_coordinator][-O87CxU3RRSTHZkuC985Yw][FRONTEND2][inet[FRONTEND2/192.168.10.73:55591]]{datacentrename=site2, data=false, nodename=dvlp_FRONTEND2_coordinator, master=true}] not master for join request from [[dvlp_FRONTEND2][C2f-euXcRc-cEv3dnsBnXw][FRONTEND2][inet[/192.168.10.73:50882]]{datacentrename=site2, nodename=dvlp_FRONTEND2, master=false}]]; ], tried [3] times
[2014-11-07 17:20:48,831][INFO ][cluster.service ] [dvlp_FRONTEND2] detected_master [dvlp_FRONTEND2_coordinator][-O87CxU3RRSTHZkuC985Yw][FRONTEND2][inet[/192.168.10.73:55591]]{datacentrename=site2, data=false, nodename=dvlp_FRONTEND2_coordinator, master=true}, added {[dvlp_DEVBATH01.exabre.co.uk_loadbalancer][8i4izXAUQiWeS2arwV9LeA][DEVBATH01][inet[/192.168.10.65:12184]]{datacentrename=site1, data=false, nodename=dvlp_DEVBATH01.exabre.co.uk_loadbalancer, master=true},[dvlp_FRONTEND2_coordinator][-O87CxU3RRSTHZkuC985Yw][FRONTEND2][inet[/192.168.10.73:55591]]{datacentrename=site2, data=false, nodename=dvlp_FRONTEND2_coordinator, master=true},[dvlp_FRONTEND2_loadbalancer][joVXc_fGTx-SC_YwJ2YBmQ][FRONTEND2][inet[/192.168.10.73:65341]]{datacentrename=site2, data=false, nodename=dvlp_FRONTEND2_loadbalancer, master=false},[dvlp_FRONTEND1_loadbalancer][snDHwo0YTR6VsAFV9nBcxw][FRONTEND1][inet[/192.168.10.70:55054]]{datacentrename=site1, data=false, nodename=dvlp_FRONTEND1_loadbalancer, master=false},}, reason: zen-disco-receive(from master [[dvlp_FRONTEND2_coordinator][-O87CxU3RRSTHZkuC985Yw][FRONTEND2][inet[/192.168.10.73:55591]]{datacentrename=site2, data=false, nodename=dvlp_FRONTEND2_coordinator, master=true}])
[2014-11-07 17:21:01,937][INFO ][cluster.service ] [dvlp_FRONTEND2] added {[dvlp_FRONTEND1_coordinator][4y8Hh5kAQPK2Ie3gzc58Ww][FRONTEND1][inet[/192.168.10.70:55858]]{datacentrename=site1, data=false, nodename=dvlp_FRONTEND1_coordinator, master=true},}, reason: zen-disco-receive(from master [[dvlp_FRONTEND2_coordinator][-O87CxU3RRSTHZkuC985Yw][FRONTEND2][inet[/192.168.10.73:55591]]{datacentrename=site2, data=false, nodename=dvlp_FRONTEND2_coordinator, master=true}])
[2014-11-07 17:25:25,598][INFO ][monitor.jvm ] [dvlp_FRONTEND2] [gc][old][739][27] duration [8s], collections [1]/[9s], total [8s]/[8.8s], memory [7.8gb]->[7.7gb]/[7.9gb], all_pools {[young] [172.4mb]->[46.5mb]/[199.6mb]}{[survivor] [24.9mb]->[0b]/[24.9mb]}{[old] [7.6gb]->[7.7gb]/[7.7gb]}
[2014-11-07 17:25:46,387][INFO ][monitor.jvm ] [dvlp_FRONTEND2] [gc][old][746][32] duration [5s], collections [1]/[6s], total [5s]/[23.6s], memory [7.9gb]->[7.9gb]/[7.9gb], all_pools {[young] [195mb]->[199.6mb]/[199.6mb]}{[survivor] [0b]->[10.9mb]/[24.9mb]}{[old] [7.7gb]->[7.7gb]/[7.7gb]}
[2014-11-07 17:28:16,136][WARN ][index.warmer ] [dvlp_FRONTEND2] [dvlp_13_67_item_20140410][7] failed to load fixed bitset for [org.elasticsearch.index.search.nested.NonNestedDocsFilter@fd00879d]
org.elasticsearch.common.util.concurrent.ExecutionError: java.lang.OutOfMemoryError: Java heap space
at org.elasticsearch.common.cache.LocalCache$Segment.get(LocalCache.java:2201)
at org.elasticsearch.common.cache.LocalCache.get(LocalCache.java:3937)
at org.elasticsearch.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4739)
at org.elasticsearch.index.cache.fixedbitset.FixedBitSetFilterCache.getAndLoadIfNotPresent(FixedBitSetFilterCache.java:139)
at org.elasticsearch.index.cache.fixedbitset.FixedBitSetFilterCache.access$100(FixedBitSetFilterCache.java:75)
at org.elasticsearch.index.cache.fixedbitset.FixedBitSetFilterCache$FixedBitSetFilterWarmer$1.run(FixedBitSetFilterCache.java:287)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.OutOfMemoryError: Java heap space
at org.apache.lucene.util.FixedBitSet.<init>(FixedBitSet.java:187)
at org.apache.lucene.search.MultiTermQueryWrapperFilter.getDocIdSet(MultiTermQueryWrapperFilter.java:104)
at org.elasticsearch.common.lucene.search.NotFilter.getDocIdSet(NotFilter.java:49)
at org.elasticsearch.index.search.nested.NonNestedDocsFilter.getDocIdSet(NonNestedDocsFilter.java:46)
at org.elasticsearch.index.cache.fixedbitset.FixedBitSetFilterCache$2.call(FixedBitSetFilterCache.java:142)
at org.elasticsearch.index.cache.fixedbitset.FixedBitSetFilterCache$2.call(FixedBitSetFilterCache.java:139)
at org.elasticsearch.common.cache.LocalCache$LocalManualCache$1.load(LocalCache.java:4742)
at org.elasticsearch.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3527)
at org.elasticsearch.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2319)
at org.elasticsearch.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2282)
at org.elasticsearch.common.cache.LocalCache$Segment.get(LocalCache.java:2197)
... 8 more
[2014-11-07 17:28:29,215][INFO ][monitor.jvm ] [dvlp_FRONTEND2] [gc][old][749][40] duration [22.9s], collections [4]/[2.3m], total [22.9s]/[1m], memory [7.9gb]->[7.9gb]/[7.9gb], all_pools {[young] [199.5mb]->[199.6mb]/[199.6mb]}{[survivor] [22.9mb]->[23.1mb]/[24.9mb]}{[old] [7.7gb]->[7.7gb]/[7.7gb]}
[2014-11-07 17:28:23,797][WARN ][index.warmer ] [dvlp_FRONTEND2] [dvlp_13_67_item_20140410][7] failed to load fixed bitset for [org.elasticsearch.index.search.nested.NestedDocsFilter@fd00879d]
org.elasticsearch.common.util.concurrent.ExecutionError: java.lang.OutOfMemoryError: Java heap space
at org.elasticsearch.common.cache.LocalCache$Segment.get(LocalCache.java:2201)
at org.elasticsearch.common.cache.LocalCache.get(LocalCache.java:3937)
at org.elasticsearch.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4739)
at org.elasticsearch.index.cache.fixedbitset.FixedBitSetFilterCache.getAndLoadIfNotPresent(FixedBitSetFilterCache.java:139)
at org.elasticsearch.index.cache.fixedbitset.FixedBitSetFilterCache.access$100(FixedBitSetFilterCache.java:75)
at org.elasticsearch.index.cache.fixedbitset.FixedBitSetFilterCache$FixedBitSetFilterWarmer$1.run(FixedBitSetFilterCache.java:287)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.OutOfMemoryError: Java heap space
at org.apache.lucene.util.FixedBitSet.<init>(FixedBitSet.java:187)
at org.apache.lucene.search.MultiTermQueryWrapperFilter.getDocIdSet(MultiTermQueryWrapperFilter.java:104)
at org.elasticsearch.index.search.nested.NestedDocsFilter.getDocIdSet(NestedDocsFilter.java:50)
at org.elasticsearch.index.cache.fixedbitset.FixedBitSetFilterCache$2.call(FixedBitSetFilterCache.java:142)
at org.elasticsearch.index.cache.fixedbitset.FixedBitSetFilterCache$2.call(FixedBitSetFilterCache.java:139)
at org.elasticsearch.common.cache.LocalCache$LocalManualCache$1.load(LocalCache.java:4742)
at org.elasticsearch.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3527)
at org.elasticsearch.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2319)
at org.elasticsearch.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2282)
at org.elasticsearch.common.cache.LocalCache$Segment.get(LocalCache.java:2197)
... 8 more
```