Continuous high CPU and load + EsRejectedExecutionException

Hi all,

We're running a cluster with the following configuration:

  • 6 data nodes (each has 31G RAM allocated to ES heap, and 16 vCPU - m4.4xlarge on AWS)
  • 1 main index (others are very small)
  • 21 shards
  • 1 replica
  • ~600,000,000 documents, and;
  • 800G of data

We are running thousands of queries per second and continuously experiencing high CPU & load (almost 100%) causing EsRejectedExecutionException every few minutes.

We are using ES v2.3.4.

What would you recommend to do in this case (without modifying the ES version or increasing the number of indices)?

Would increasing the CPU or the number of nodes be helpful?

Many thanks!

Hi Roni,

I would recommend checking out the search slowlog to see which queries are taking a long time, from there, I think you may want to see if you can optimize those so that they don't take as long to execute in the queue.

You could also check the hot_threads API: Nodes hot_threads | Elasticsearch Guide [6.4] | Elastic to see what is taking the bulk of the processing.

Would increasing the CPU or the number of nodes be helpful?

This would probably help (I can't tell for certain without knowing exactly why the queries are filling the queue), so if you need a faster solution, you can almost always add resources to help with it.

Another thing I would recommend is to try and upgrade if you can, we've made a lot of improvements since 2.3.4, in particular, one that might help you is Adaptive Replica Selection: Search APIs | Elasticsearch Guide [6.4] | Elastic

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.