Elasticsearch high CPU usage due to one index and replica doesnt help

jaykb77 · April 6, 2024, 5:32am

Hi all,

We have an ES cluster spanning different cloud regions, say region1 and region2. We have below configuration on elasticsearch.yml (this one for region1 machines) so an index with 1p:1r will have a either a primary or replica in each region.

cluster.routing.allocation.awareness.attributes: region
node.attr.region: region1

We route our traffic to one region through DNS+load balancer and we started seeing an issue with one of the indices recently.

Lets say, we are currently routing traffic through DNS to region1, now an index, say index1 has its primary on a node in region1 and replica on a node in region 2.

Due to some excessive querying for data in index1, CPU usage on node thats hosting primary of the index in region1 gets really exhausted, but we also notice that the node in region2 which has the replica doesnt have much activity.

Shouldn't Elasticsearch be routing/deviding traffic and get the node with the replica to help as well in such situation?

Christian_Dahlqvist · April 6, 2024, 6:28am

It is generally not recommended to deploy Elasticsearch across regions unless possibly if they are quite close and offer very low latencies between them. With just 2 regions it is also impossible to make the cluster HA, so a third region may be required.

Which version of Elasticsearch are you using? If you are on a reasonably new version, Elasticsearch by default uses adaptive replica selection when executing a query. If you have long latencies between the regions queries executed against the remote shard will be slower and the local shard is likely to be favoured. You may want to experiment with disabling this, but that will send approximately half of queries to the remote shard, which could increase latencies for all/most queries.

jaykb77 · April 6, 2024, 8:38am

We have 7.17.x. Yeah, increased latency is a tradeoff but looking at the situation we have with adaptive replica selection enabled, disabling could be relatively better state overall. We will try this out. Thank you

jaykb77 · April 6, 2024, 8:50am

Yes. But we had to stick to this setup due to budget related concerns. We are trying to have a voting node on a third location to address this. Thank you again for all the suggestions

Topic		Replies	Views
Dealing with latency when indexing Elasticsearch	5	682	July 6, 2017
Cluster with routing enabled has one node with consistently high CPU usage Elasticsearch	2	160	May 31, 2023
Scaling ES indexing CPU usage Elasticsearch	6	968	July 5, 2017
High CPU Load on only some of the machines in a cluster Elasticsearch	14	2251	July 6, 2017
High CPU usage on only 1 Data node Elasticsearch	7	912	October 16, 2020

Elasticsearch high CPU usage due to one index and replica doesnt help

Related topics