I've been using ES for a while - but dont have much experience configuring and optimising its settings.
In short - I keep crashing my nodes with large queries , when the heap seems to fill up.
The long - I have , in this case, a 2 node cluster (JVM: 1.7.0_85, ES: 1.7.3, Ubuntu 14.04 LTS) with pretty much default installations except for:
bootstrap.mlockall: true - This is recommended everywhere for performance , but maybe I should be setting this to false if I'm going to run queries exceeding the heap ?
Update: Turns out when I checked
curl http://localhost:9200/_nodes/process?pretty , mlockall was false. Ran
ulimit -l unlimited , is true now . Will test if it makes a difference.
I have daily indexes , with about 100mil docs per day , 4 shards each with 1 replication.
Most of the time I want to be looking at todays data and things seem to be working fine , if not great , most of the time. Some sluggish response times can be credited to the hardware (although they are 2 pretty decent Dell servers , with dual 6core processors , 96GB ram and 8xdrive raid) - which I'm ok with/ can work at.
However from time to time I need to look at a couple of days worth of data (Using Kibana for this) at which point - with the current data set - the heap fills up rapidly and then the node crashes.
Again - I'm ok with slow responses for a large data set - but I cant have the nodes fall over - so , what can I configure to prevent that ?
.1 As far as I understand - its best to keep the heap at 32GB - instead of raising it above that ?
.2 As mentioned before , my indexes are created with 4 shards and 1 replication (i.e. four shards on each of the nodes). I figured replicated shards are better , especially if node are falling over around me. But also , having all the data on both nodes - I would have imagined the load to be shared for queries. On the contrary - it seems all 4 primary shards are allocated to one or the other node (seemingly randomly) and that node then seems to try and service all queries for that data.
Should I change my indexing scheme ? Should I be setting up my nodes with dedicated roles instead ? Although often I only have 1 node to run on until I can justify adding more hardware.
.3 If you should only assign 32GB heap to a node and a node seems to fall over (or the field breaker triggers) once the heap is full and all field data needs to be loaded for the query to run.... does it mean a single node can essentially not query more than 32gb data at a time ?
Sorry for the long post ... any help , pointers or suggestions appreciated.