Incomprehensible Elasticsearch behaviour

Hristo_Angelov · August 26, 2020, 6:50am

I'm running a two nodes elasticsearch cluster(v. 5.6.10) which I monitor and I notice some interesting graphs which I can't understand. The first one is a graph of merge operations over time:

As far as I can see it's more like a linear growth so I assume there's something which can explain this but I can't find any information. And the more interesting thing is that at 00:00 it drops to zero. Can someone explain what is causing this?

The second graph is pretty much the same as the first one but it's a graph of the heap used by the cluster:

This looks like a memory leak to me and again around 00:00 the heap usage resets.

Here's a graph of the elasticsearch operations(indexing rate and search rate). As we can see from them there's almost no indexing and the peak of the search requests is 40 per second which I think is not that much load.

The issue I'm facing is that the peak time of all graphs coincides with the 'rush hour' of my application and the system becomes irresponsive.

Here's some information about the setup of the cluster:

I have 2 virtual machines and the nodes are running in a separate docker containers on each of them.

Node 1 hardware(The GREEN graph):

8 core Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz
a spinning hard drive
16gb of heap allocated

Node 2 hardware(The YELLOW graph):

8 core Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz
a spinning hard drive
16gb of heap allocated

On these virtual machines runs also the db cluster, so the processor is shared between the db cluster and the elasticseach cluster.

Another thing that is worth mentioning is that because of the spinning disks I tried this recommendation https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules-merge.html but it didn't change anything. I applied the setting on index level without restarting the cluster as I found on several places information that it's a runtime setting.

warkolm · August 26, 2020, 6:53am

A few general comments;

5.X has been EOL for some time now. You should upgrade as a matter of urgency.

And 2 nodes is not ideal, you run a risk of not maintaining a quorum/split brain.

That's unlikely to be a good thing due to resource contention.

Moving on though; What do your Elasticsearch logs show for things like GC?

Hristo_Angelov · August 27, 2020, 7:42am

I can't move right now, because the usage is tightly coupled with hibernate search and unfortunately the new version of hibernate search, which supports ES 6 and above is still in development.

Regarding the logs- no there's nothing suspicious in the logs(no GC logs at all).

Des someone knows about some automatic operations executed on the cluster, because as we can see on the graphs the problem happens at the exact same time every day? It looks like a cron job is executed which leads to some service interruptions.

warkolm · August 31, 2020, 1:43am

Not in 5.X, is there anything in cron on the hosts?

system · September 28, 2020, 1:43am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Apparent memory leak after a few days of heavy indexing Elasticsearch	6	1584	July 6, 2017
Cluster Low CPU/Heap Usage Elasticsearch	3	818	May 3, 2017
What's using memory in ElasticSearch? (Details to follow...) Elasticsearch	8	1923	July 6, 2017
ES 7.4 GC keeps reclaiming less memory on each pass Elasticsearch	9	531	May 24, 2020
Identical data nodes with widely different memory behaviour Elasticsearch	5	405	December 28, 2018

Incomprehensible Elasticsearch behaviour

Related topics