Elasticsearch - Optimising Request Time | Infrastructure | Self Hosted AWS

Hello Team,

I am looking for some help in understanding the current state and optimising the response time of elasticsearch cluster that we are running in production. We are having High CPU utilisation and the search query time, indexing time has increased.

A few elasticsearch cluster stats.
Elasticsearch Cluster

  1. Total Number of Nodes : 9
    Data + Master : 4
    Data : 4
    Dedicated Master : 1

  2. We are using i3 instance types from AWS
    2 vCPU
    15.75 GB RAM
    475 GB NVMe SSD on each instance

  3. Total Data on Elasticsearch Cluster : 95.6GB

  4. Number of HTTP Open/minute : 4000-4500 (approx 45000 requests /10 minute interval)

  5. Total Heap Allocated on each instance : 8 GB

We recently enabled Kibana Monitoring. Here is a screenshot for one of the index which has high latency as to 600 ms avg.

I am not sure where to begin and what exactly are we missing here. Can someone guide us in the right direction of what needs to change?

Please let us know if there is additional data required.

Regards,
Somnath

Adding the metric for Last 24 Hours

Overview

Advanced Metrics

It looks like your search rate has been going up in parallel with your indexing operations. It would make sense that read response rate would go up if you are indexing a large amount of documents at the same time.

Hey @gotigers ,

Thanks for the response. Yes, The cluster has a mix of large documents being indexed. This is a mix of inserts and updates on the index.

We have fetches on the same index at periodic intervals.

Is there any possibility of optimising or fine tuning the settings or the index management so that we can reduce the overall time for search, indexing and refresh?

Regards,
Somnath