Is there a slow log (or something similar) for shard refresh durations?

BenB196 · January 30, 2023, 12:02am

Hi All,

I'm attempting to debug a somewhat strange issue. Where I have a query which runs every 60 seconds to check a set of logs and if there are no logs for 90 seconds then trigger an alert (this is done via a Kibana rule).

These logs get generated (and in theory indexed) on a regular interval, every ~20 seconds, so the only time this alert should fire is when logs are not getting generated for some reason. However, I have had on a number of occasions now, false positive alerts where the rule thinks there are no logs, but if I check the logs they exist for the alerting window (I am looking at both the @timestamp and event.ingested times.

I have a hypothesis that these false positives happen because the Elasticsearch node which holds the index becomes overload (CPU maxed out) for a period of time and thus causes a shard refresh to take longer than expected, thus causing the query to not see the logs even though they technically exists. What I haven't found is a way to prove/disprove this hypothesis, as by the time I actually get to look at the logs in question they exist. Does anyone know if there is a way to have Elasticsearch log when a shard refresh takes longer than a specific duration? (Or have another idea for trying to debug this issue?)

For reference I'm using an Elastic Stack on 8.5.3

system · February 27, 2023, 12:02am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.