Regular IO peak in Elasticsearch data disk

We observed regular IO peak in Elasticsearch data disk, it is triggered every 10 minutes. How can I find which function triggered this? There is nothing in ES log.

What is this a graph of exactly?

It is transaction number per minute of Azure File, metrics from Azure Portal, we use Azure File as data disk, the ELK stack run in AKS via Elastic Cloud on K8s.

You mean https://docs.microsoft.com/en-us/azure/storage/files/storage-files-introduction? If so it's not really recommended to run Elasticsearch on these sorts of filesystems.

We use hot/cold architecture, cold node use Azure File as data disk. The price model of Azure File is that IO is charged more than regular disk, but storage price is cheaper. By default data are written to hot node, data will be rollover to cold node by ILM.

The problem is that there are a lot of IO even when no rollover happen. There is regular IO peak on cold node, we want to avoid this to reduce meaningless cost. So we want to know how to troubleshoot to find the root cause. I didn't find any possible schedule task mentioned in document.

You might want to enable Monitoring on your cluster to see what is happening, it could be merges for eg.

Thanks warkolm, do you have any link about Monitoring you mentioned. We are using Basic License, some advanced features may not available for us.

Check out https://www.elastic.co/kibana/features#full-stack-monitoring, it's part of Basic.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.