Spikes in Indexing Speed in Elasticsearch

Hi, for our application we are currently using a single node ES instance (planning to migrate to cluster in future). The average duration of pushing data from my Java based interface to ES server is around 10ms. However, everyday for around 3-4 hours, without a fixed period or interval the indexing time increases and the average duration of pushing the data goes up to 200ms. During this period the ES server's core utilization gets reduced by a lot. Since, we currently use Kafka for feeding the data to writer service responsible for writing data to ES, we do not lose any data but, the lag in Kafka increases significantly. After the incident is completed the core utilization of ES server increases for a while and all the Kafka lag is cleared (thanks to average time going back to 10ms).
There is no difference in the load of the amount of data pushed to ES for indexing during the period and, currently we haven't enabled the search functionality on it so, the load from there is always 0. I have checked some documentations specific to improving the indexing speed, some of which are incorporated but, none of them resolve the issue of getting spikes in indexing performance.

Version of ES: 8.14.3
Version of Java client used for indexing data: 8.14.3
Total active indexes: 601
Total active shards: 601
Heap used percentage: 70% (max)
Allocated max heap: 20 GB

Following graph shows the core utilization of ES service:-

Use-case Explanation: We use ES for providing a global search feature in our application. The indexes in our ES is that's why based on the number of different groups in our application where users can search at max one group at a time.

Thanks in advance for the help here.

Check your Elasticsearch logs for any messages around long or frequent GC around the times the issues accur. If you find any it could indicate that your heap and RAM is too small.

Another thing that can cause slowdowns in indexing periodically in ingest only workloads is heavy merging operations. You did not indicate what type of storage you are using, but if this is on the slow side it could cause these kind of issues. Have a look at disk I/O and await stats at normal times and compare this to when you are seeing these issues.

1 Like