I'm running a two nodes elasticsearch cluster(v. 5.6.10) which I monitor and I notice some interesting graphs which I can't understand. The first one is a graph of merge operations over time:
As far as I can see it's more like a linear growth so I assume there's something which can explain this but I can't find any information. And the more interesting thing is that at 00:00 it drops to zero. Can someone explain what is causing this?
The second graph is pretty much the same as the first one but it's a graph of the heap used by the cluster:
This looks like a memory leak to me and again around 00:00 the heap usage resets.
Here's a graph of the elasticsearch operations(indexing rate and search rate). As we can see from them there's almost no indexing and the peak of the search requests is 40 per second which I think is not that much load.
The issue I'm facing is that the peak time of all graphs coincides with the 'rush hour' of my application and the system becomes irresponsive.
Here's some information about the setup of the cluster:
I have 2 virtual machines and the nodes are running in a separate docker containers on each of them.
Node 1 hardware(The GREEN graph):
- 8 core Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz
- a spinning hard drive
- 16gb of heap allocated
Node 2 hardware(The YELLOW graph):
- 8 core Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz
- a spinning hard drive
- 16gb of heap allocated
On these virtual machines runs also the db cluster, so the processor is shared between the db cluster and the elasticseach cluster.
Another thing that is worth mentioning is that because of the spinning disks I tried this recommendation https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules-merge.html but it didn't change anything. I applied the setting on index level without restarting the cluster as I found on several places information that it's a runtime setting.