Need help with configuring memory usage

Hi.

I've been using ES for a while - but dont have much experience configuring and optimising its settings.

In short - I keep crashing my nodes with large queries , when the heap seems to fill up.

The long - I have , in this case, a 2 node cluster (JVM: 1.7.0_85, ES: 1.7.3, Ubuntu 14.04 LTS) with pretty much default installations except for:
ES_HEAP_SIZE=32g
MAX_OPEN_FILES=262140
bootstrap.mlockall: true - This is recommended everywhere for performance , but maybe I should be setting this to false if I'm going to run queries exceeding the heap ?

Update: Turns out when I checked curl http://localhost:9200/_nodes/process?pretty , mlockall was false. Ran ulimit -l unlimited , is true now . Will test if it makes a difference.

I have daily indexes , with about 100mil docs per day , 4 shards each with 1 replication.

Most of the time I want to be looking at todays data and things seem to be working fine , if not great , most of the time. Some sluggish response times can be credited to the hardware (although they are 2 pretty decent Dell servers , with dual 6core processors , 96GB ram and 8xdrive raid) - which I'm ok with/ can work at.

However from time to time I need to look at a couple of days worth of data (Using Kibana for this) at which point - with the current data set - the heap fills up rapidly and then the node crashes.
Again - I'm ok with slow responses for a large data set - but I cant have the nodes fall over - so , what can I configure to prevent that ?

.1 As far as I understand - its best to keep the heap at 32GB - instead of raising it above that ?

.2 As mentioned before , my indexes are created with 4 shards and 1 replication (i.e. four shards on each of the nodes). I figured replicated shards are better , especially if node are falling over around me. But also , having all the data on both nodes - I would have imagined the load to be shared for queries. On the contrary - it seems all 4 primary shards are allocated to one or the other node (seemingly randomly) and that node then seems to try and service all queries for that data.
Should I change my indexing scheme ? Should I be setting up my nodes with dedicated roles instead ? Although often I only have 1 node to run on until I can justify adding more hardware.

.3 If you should only assign 32GB heap to a node and a node seems to fall over (or the field breaker triggers) once the heap is full and all field data needs to be loaded for the query to run.... does it mean a single node can essentially not query more than 32gb data at a time ? :confused:

Sorry for the long post ... any help , pointers or suggestions appreciated.

bootstrap.mlockall: true - This is recommended everywhere for performance , but maybe I should be setting this to false if I'm going to run queries exceeding the heap ?

I think you're conflating unrelated things. Enabling bootstrap.mlockall means that ES's memory pages can't be evicted from RAM, and you never want swapping to occur.

.1 As far as I understand - its best to keep the heap at 32GB - instead of raising it above that ?

Yes, except that the limit is strictly lower than 32 GB. I've never seen an authoritative figure from Oracle but I've heard both 30.5 GB and "anything lower than 32 GB is okay".

Thanx for the reply.

Ok , so it simply means that the 32GB heap will remain resident ?
From the elastic documentation , "mlockall might cause the JVM or shell session to exit if it tries to allocate more memory than is available!" - is this not whats happening when I'm trying to execute a large query?

Ok , I lied , I actually had it set to 31GB , specifically for the < 32GB reason , but I can lower it to 30 - however , thats just a performance related issue , not true?
I know you'll technically have 'less' heap available if you raise it above 32GB , because of the pointer sizes - but my problem is not so much about not having enough/optimal heap as much as what happens when that runs out.

No, that only applies to startup.