Long running query, doc_fields, and lots of exceptions

I'm running a relatively large index (2.8B documents, 32 shards) mostly for
analytics (all fields are not_analyzied, doc_fields).
Many queries take a long time to execute, and often, produce a plethora of
shard failures, that I'm trying to track down and fix.
Would appreciate ideas.

Here's a collection generated by 1 query:

        "shard": 7,
        "status": 500,
        "reason": 

"NodeDisconnectedException[[i-5907f074][inet[/10.0.1.226:9300]][search/phase/query]
disconnected]"

        "shard": 4,
        "status": 404,
        "reason": 

"RemoteTransportException[[i-db51a6f6][inet[/10.0.1.229:9300]][search/phase/fetch/id]];
nested: SearchContextMissingException[No search context found for id
[5946]]; "

        "status": 500,
        "reason": 

"RemoteTransportException[[i-d4b559f9][inet[/10.0.1.231:9300]][search/phase/query]];
nested: IllegalArgumentException[Self-suppression not permitted]; nested:
OutOfMemoryError[Java heap space]; "

I've seen sporadic occurrences of SearchContextMissingException even in the
absence of OOM - any pointers on how to increase timeouts.
Obviously, the out of memory is an issue, which I'm planning to resolve by
moving some shards around. (The index is hosted on a combination of
machines comprising of 2 types: 15G ram and 4 Gig ram. The smaller nodes
only have 2 shards each, while the large ones have many more.)

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/fd8f10d5-d8e6-4b24-be30-65bf378a1648%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.