Hi All,
We have NetApp storage in our environment. And there is one cluster which is hardly being used by the application, and there is very minimal data traffic coming into this cluster
We are using Zabbix monitoring tool to monitor Elasticsearch & NetApp statistics. Whenever we stop & start the Elasticsearch service in any one of the 3-nodes, NetApp immediately reports very high volume latency alerts for that particular volume. Whereas we don't see any such high latency alerts when we restart other applications/databases/services. It is only reporting for Elasticsearch volumes.
When I checked the cpu & memory of the node when this alert got triggered, I did not notice any abnormality. Cluster was healthy, and it was very responsive for all the CLI commands.
The threshold set for this volume latency alert is 5000ms. And as per NetApp, it recorded latency values as much as 12000ms!!
We are not exactly sure what is causing the problem. Can anyone please help how to find the problem and any solution to fix this??
Cluster Configuration:
3-node cluster
40GB memory (allocated 20GB as heap mem)
8 core vCPU
350GB disk per node (path.data)
Below screenshot from Zabbix dashboard which shows NetApp volume associated with this cluster: