Shards per node, Heap, 100% CPU....help please

kk123 · July 19, 2021, 8:17pm

Hi, I have a 2 node cluster with 1500 indices. Indices persist over time. This was set up by a third party... but now I am having the problem that the master node becomes unresponsive (100% CPU) and the only way to recover it is by restarting the elasticsearch service.

Each index has 5 shards and 1 replicas.... The cluster has worked very well for the last 5 years... it is around 50Gb in size...

I was reading about shards and came across a post recommending 20 shards per 1Gb of Heap on a node. Now I realise my setup far exceed this.... (13000 shards on 2 nodes) Each node has a 4Gb heap.

The frequency of the cluster becoming unnresponsive is slowly increasing. I am in the process of migrating to the latest version of elasticsearch in the hope that is will help resolve the problem... although I also really need to upgrade to stay within the support periods... I'm on 1.7 at the moment.

So I have reading about setups/config etc and trying to work out how I can improve the current set up...

a) would the number of shards per heap potentially cause the problem I am seeing. Assuming more and more data and indices are being added over time?
b) would decreasing the number of shard per index be the way to go?

Any help would be appreciated....

Christian_Dahlqvist · July 19, 2021, 8:29pm

Yes. Having lots of very small shards is very inefficient and can cause performance as well as severe stability problems.

Yes, that will reduce the rate at which the shard count increases. You do however also need to dramatically reduce the shard cound and as you are on such an old version your options are as far as I can remember limited ton deleting indices or reindexing them into fewer larger ones.

kk123 · July 19, 2021, 8:42pm

Hi Christian, thank you for the very quick reply...

What would you suggest is the correct number of shard per index in my scenario... Or at least your best guess? I assume I want at least 2? But should it be 3/4?

It is also worth noting that although the template of the indices are all the same and the type of data they store are the same... some are virtually empty and others are "very" large.. i.e. their sizes are not in any way consistent... is there some strategy I should deploy here?

And finally I am am exporting the current data from 1.7.1 and restructuring (manually re-indexing) the docs externally and then importing it into 7.13... so that should work well for this. I will leave the old cluster as is... but set up the new one with the "more appropriate" settings.

Thanks again!

stephenb · July 19, 2021, 9:09pm

Hi @kk123

Here is a great set of documents, talks about shard sizes and how to fix and over shared cluster

kk123 · July 21, 2021, 4:24pm

Thanks @stephenb I'll certainly give that a read....

system · August 18, 2021, 4:24pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elasticsearch heap issues Elasticsearch	4	439	July 5, 2017
Elasticsearch Index shards per nodes Elasticsearch	13	1238	October 5, 2020
Performance degrading after a couple of weeks Elasticsearch	7	521	October 30, 2018
Heap usage vs number of shards Elasticsearch	13	3160	November 19, 2017
Data node high CPU Elasticsearch	19	3647	February 26, 2018

Shards per node, Heap, 100% CPU....help please

Related topics