We're running a cluster with the following configuration:
- 6 data nodes (each has 31G RAM allocated to ES heap, and 16 vCPU - m4.4xlarge on AWS)
- 1 main index (others are very small)
- 21 shards
- 1 replica
- ~600,000,000 documents, and;
- 800G of data
We are running thousands of queries per second and continuously experiencing high CPU & load (almost 100%) causing EsRejectedExecutionException every few minutes.
We are using ES v2.3.4.
What would you recommend to do in this case (without modifying the ES version or increasing the number of indices)?
Would increasing the CPU or the number of nodes be helpful?