Timing issues with Watcher

alerting

#1

I have a watcher that checks for specific logfile entries and I need its action to trigger exactly once for any entries found within a specific timeframe. It currently checks for logfile entries once a minute using the following range:

"range": {
    "@timestamp": {
        "gte": "{{ ctx.trigger.scheduled_time }}||-1m",
        "lte": "{{ ctx.trigger.scheduled_time }}"
    }
}

Due to network delays and such it often happens that a logfile entry with a timestamp from an interval gets indexed only after that interval's watcher has already run, especially if the logfile entry was written to the file a few millis before the watcher triggered and logstash hasn't processed it yet. Subsequent watcher queries will therefore not find this logfile entry since it's in the previous interval. E.g.

  • Watcher checks for timestamps in the range 00:00:00 - 00:01:00 and finds no entries
  • A logfile entry with a timestamp of 00:00:59 gets indexed by logstash
  • Watcher checks for timestamps in the range 00:01:00 - 00:02:00 and finds no entries

There are a few suboptimal ways to deal with this:

  • Increase the range to two minutes into the past. Logfile entries then often trigger the watcher's action twice.
  • Have the watcher run less often, e.g. every 5 minutes. Boundary issues are less probable but still do happen.

I had the idea of going with the first option and tagging already processed logfile entries. Alas, there's no update action, only an index action. Is there a simple solution that doesn't involve webhooks back to Elastic?


(Alexander Reelsen) #2

Hey,

there is no easy way doing this. A delayed pipeline is one part, but on top of that you have the refresh interval (defaults to one second) until something is available for search.

An approach to solve your query issue could be to have an index pipeline, that adds a indexedAt timestamp, which then could be queried. This timestamp would be the moment in time you are indexing in Elasticsearch and would thus be independent from the pipeline in front of you.

If you really need to execute against every document indexed and everything should be exactly once, have you thought about using the percolator?

Hope this helps as a first overview, feel free to ask further.

--Alex


#3

Hello,

thanks for your reply. It just occured to me that I might as well run a "delayed" watcher, which would essentially solve my temporal problems under normal circumstances. E.g.:

"range": {
    "@timestamp": {
        "gte": "{{ ctx.trigger.scheduled_time }}||-2m",
        "lte": "{{ ctx.trigger.scheduled_time }}||-1m"
    }
}

It seems obvious in hindsight, I blame the lack of rubber ducks.


(Alexander Reelsen) #4

Hey,

this still implies you never have congestion in your pipeline to ingest data (network outage, maintenance, etc) - just something to be aware of, But if that works for you, all good here :slight_smile:

Thanks for posting your solution, much appreciated.

--Alex


(system) #5

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.