Watcher false alerts (query wrong? )

lueneburger · August 8, 2017, 7:27am

Hi Everyone,

i got some watcher alerts to check if the separated indexes are getting new entries, if for example a SMTP server isn't sending anything within 1 hours, then we will get an alert to check whats going on there.

now i got in the last days some false errors, watcher was firing the alert but when i look into ES i can see that there are more than 100 logs within the time span of 1 hour.

here is a snippet of the watch:

  "search_type": "query_then_fetch",
  "indices": [],
  "types": [],
  "body": {
    "size": 0,
    "query": {
      "bool": {
        "must": [
          {
            "match": {
              "_type": "postfix"
            }
          }
        ],
        "filter": {
          "range": {
            "@timestamp": {
              "gte": "now-1H",
              "lt": "now"
            }
          }
        }
      }
    }
  }
}
}
},
"condition": {
  "compare": {
    "ctx.payload.hits.total": {
      "lt": 1

did i something wrong with the time math? or im just blind, appreciate any help with it

thanks,
Dirk

spinscale · August 8, 2017, 7:46am

please share a watch history entry of one of the executions that triggered that was accidentally firing.

Thanks!

lueneburger · August 8, 2017, 8:08am

hi spinscale

oh ok, just looked into the history:

..
  "result": {
    "execution_time": "2017-08-08T02:26:27.923Z",
    "execution_duration": 1860,
    "input": {
      "type": "search",
      "status": "success",
      "payload": {
        "_shards": {
          "total": 142,
          "failures": [
            {
              "node": "TrF3vJmOSw-EhN3_Ybgglg",
              "reason": {
                "reason": "rejected execution of org.elasticsearch.transport.TransportService$7@41700a23 on EsThreadPoolExecutor[search, queue capacity = 1000, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@65b91de2[Running, pool size = 4, active threads = 4, queued tasks = 1000, completed tasks = 218052]]",
                "type": "es_rejected_execution_exception"
              },
              "index": "test-index-2017.07.23",
              "shard": 0
            },
            {
              "node": "TrF3vJmOSw-EhN3_Ybgglg",
              "reason": {
                "reason": "rejected execution of org.elasticsearch.transport.TransportService$7@4d0f6503 on EsThreadPoolExecutor[search, queue capacity = 1000, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@65b91de2[Running, pool size = 4, active threads = 4, queued tasks = 1000, completed tasks = 218054]]",
                "type": "es_rejected_execution_exception"
              },
              "index": "test-index-2017.07.26",
              "shard": 0
            },
            {
              "node": "TrF3vJmOSw-EhN3_Ybgglg",
              "reason": {
                "reason": "rejected execution of org.elasticsearch.transport.TransportService$7@573a9ee9 on EsThreadPoolExecutor[search, queue capacity = 1000, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@65b91de2[Running, pool size = 4, active threads = 4, queued tasks = 1000, completed tasks = 218054]]",
                "type": "es_rejected_execution_exception"
              },
              "index": "test-index-2017.07.27",
              "shard": 0
            }
..

never saw that before...oh...wait...did i just missed to configure the index target?...

 "watch": {
    "trigger": {
      "schedule": {
        "interval": "1h"
      }
    },
    "input": {
      "search": {
        "request": {
          "search_type": "query_then_fetch",
          "indices": [],
          "types": [],
          "body": {

can i use a date from today in the indices? like indices: [test-index-{Date}] ?

spinscale · August 8, 2017, 8:37am

Hey Dirk,

soooooo.. you are querying across all of your indices, which means you will fire a query that covers a lot of shards - this can lead to an exception, where you (possibly together with queries that happen anyway on that cluster), exhaust the thread pool resources of certain nodes. A node can you only execute n queries in parallel and has an in-memory queue for put searches in. if both are full, then the above error message is returned.

an immediate workaround would be to use date math in index names.

It sounds as if it is sufficient to query the last two indices, if you have daily indices, so rolling over midnight works.

You can specify the indices like this: <test-index-{now/d}>,<test-index-{now/d-1d}>

hope this helps!

lueneburger · August 8, 2017, 9:03am

Hey spinscale,

awesome, works perfect with the date math

just changing the watches right now, thanks for the fast help and have a nice day.

Cheers,
Dirk

system · September 5, 2017, 9:03am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Error WATCHER smtp.host Elasticsearch elastic-stack-alerting	3	370	June 17, 2021
Watcher triggers when it shouldn't Elasticsearch elastic-stack-alerting	4	540	December 29, 2020
Elasticsearch watcher error for RANGE query Elasticsearch elastic-stack-alerting	3	7782	January 31, 2017
Watcher which tries 2 times before alerting Elasticsearch elastic-stack-alerting	5	705	February 4, 2022
Watcher Timestamp showing in EPOCH Elasticsearch elastic-stack-alerting	4	921	October 1, 2021

Watcher false alerts (query wrong? )

Related topics