Elasticsearch Flush latency is too high - by Zabbix monitoring

Hi All,
I have a 3-node all master-eligible cluster, with 8 cores vCPU and 40GB memory in each node.

We have Zabbix monitoring enabled on all the nodes and its frequently triggering "Flush Latency is too high" and its over 3000ms for master node, and sometimes for the other nodes too. Sometimes the alert stays active for 1-2hours and it clears off. Sometimes the alert stays for 5-10mins and clears off automatically.

When I checked the Stack Monitoring, I did not notice any abnormal spike in the graphs.

Can anyone please tell me what is the solution for this zabbix alerts?

# ======================== Elasticsearch Configuration =========================
#
# NOTE: Elasticsearch comes with reasonable defaults for most settings.
#       Before you set out to tweak and tune the configuration, make sure you
#       understand what are you trying to accomplish and the consequences.
#
# The primary way of configuring a node is via this file. This template lists
# the most important settings you may want to configure for a production cluster.
#
# Please consult the documentation for further information on configuration options:
# https://www.elastic.co/guide/en/elasticsearch/reference/index.html
#
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
#
cluster.name: elasticsearch
#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
#
node.name: node-1
#
# Add custom attributes to the node:
#
xpack.security.enabled: false
xpack.monitoring.collection.enabled: true
#node.attr.rack: r1
#
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
path.data: /database/elasticsearch
#

# ======================== Elasticsearch Configuration =========================
# Path to log files:
#
path.logs: /var/log/elasticsearch
#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
bootstrap.memory_lock: true
#
# Make sure that the heap size is set to about half the memory available
# on the system and that the owner of the process is allowed to use this
# limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#
# By default Elasticsearch is only accessible on localhost. Set a different
# address here to expose this node on the network:
#
network.host: 0.0.0.0
network.publish_host: "192.168.XXX.XXX"
#
# By default Elasticsearch listens for HTTP traffic on the first free port it
# finds starting at 9200. Set a specific HTTP port here:
#
http.port: 9200
#
# For more information, consult the network module documentation.
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when this node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#
#discovery.seed_hosts: ["host1", "host2"]
discovery.seed_hosts: ["192.168.XXX.XX1", "192.168.XXX.XX2", "192.168.XXX.XX3"]
#
# Bootstrap the cluster using an initial set of master-eligible nodes:
#
#cluster.initial_master_nodes: ["node-1", "node-2"]
cluster.initial_master_nodes: ["node-1", "node-2", "node-3"]
#
# For more information, consult the discovery and cluster formation module documentation.
#
# ---------------------------------- Various -----------------------------------
#
# Require explicit names when deleting indices:
#
action.destructive_requires_name: true

What type of storage are you using? How many indices and shards are you actively indexing into?

Is there actually a problem with your Elasticsearch installation? This alert seems unnecessary (or at least unreasonably sensitive) so the simplest fix would be to disable it.

Hi Christian,
Thank you for your response.
We are using SSD storage. And 3 indices per day, with 4GB total index size perday(including replication); and retention of around 20 days with storage of around 1TB across all nodes.

Hi @DavidTurner
Thank you for the response.

I don't see any problem with the installation as such. The cluster is working fine as normally.
Can you suggest what would be the ideal threshold value for Flush Latency ?
Currently its 100ms - which is actually very low.

I don't think it's a question of setting the right threshold so much as whether to alert on this metric at all. I'd say that it's not a good candidate for alerting. It could be very variable even in a perfectly healthy cluster, but as long as everything is performing well then why worry?

2 Likes

Yeah, okay!
thank you so much!!
We have disabled this alert in zabbix :slight_smile:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.