Watcher false alerts (query wrong? )

alerting

(Dirk L√ľneburger) #1

Hi Everyone,

i got some watcher alerts to check if the separated indexes are getting new entries, if for example a SMTP server isn't sending anything within 1 hours, then we will get an alert to check whats going on there.

now i got in the last days some false errors, watcher was firing the alert but when i look into ES i can see that there are more than 100 logs within the time span of 1 hour.

here is a snippet of the watch:

  "search_type": "query_then_fetch",
  "indices": [],
  "types": [],
  "body": {
    "size": 0,
    "query": {
      "bool": {
        "must": [
          {
            "match": {
              "_type": "postfix"
            }
          }
        ],
        "filter": {
          "range": {
            "@timestamp": {
              "gte": "now-1H",
              "lt": "now"
            }
          }
        }
      }
    }
  }
}
}
},
"condition": {
  "compare": {
    "ctx.payload.hits.total": {
      "lt": 1

did i something wrong with the time math? or im just blind, appreciate any help with it :slight_smile:

thanks,
Dirk


(Alexander Reelsen) #2

please share a watch history entry of one of the executions that triggered that was accidentally firing.

Thanks!


(Dirk L√ľneburger) #3

hi spinscale :slight_smile:

oh ok, just looked into the history:

..
  "result": {
    "execution_time": "2017-08-08T02:26:27.923Z",
    "execution_duration": 1860,
    "input": {
      "type": "search",
      "status": "success",
      "payload": {
        "_shards": {
          "total": 142,
          "failures": [
            {
              "node": "TrF3vJmOSw-EhN3_Ybgglg",
              "reason": {
                "reason": "rejected execution of org.elasticsearch.transport.TransportService$7@41700a23 on EsThreadPoolExecutor[search, queue capacity = 1000, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@65b91de2[Running, pool size = 4, active threads = 4, queued tasks = 1000, completed tasks = 218052]]",
                "type": "es_rejected_execution_exception"
              },
              "index": "test-index-2017.07.23",
              "shard": 0
            },
            {
              "node": "TrF3vJmOSw-EhN3_Ybgglg",
              "reason": {
                "reason": "rejected execution of org.elasticsearch.transport.TransportService$7@4d0f6503 on EsThreadPoolExecutor[search, queue capacity = 1000, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@65b91de2[Running, pool size = 4, active threads = 4, queued tasks = 1000, completed tasks = 218054]]",
                "type": "es_rejected_execution_exception"
              },
              "index": "test-index-2017.07.26",
              "shard": 0
            },
            {
              "node": "TrF3vJmOSw-EhN3_Ybgglg",
              "reason": {
                "reason": "rejected execution of org.elasticsearch.transport.TransportService$7@573a9ee9 on EsThreadPoolExecutor[search, queue capacity = 1000, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@65b91de2[Running, pool size = 4, active threads = 4, queued tasks = 1000, completed tasks = 218054]]",
                "type": "es_rejected_execution_exception"
              },
              "index": "test-index-2017.07.27",
              "shard": 0
            }
..

never saw that before...oh...wait...did i just missed to configure the index target?...

 "watch": {
    "trigger": {
      "schedule": {
        "interval": "1h"
      }
    },
    "input": {
      "search": {
        "request": {
          "search_type": "query_then_fetch",
          "indices": [],
          "types": [],
          "body": {

can i use a date from today in the indices? like indices: [test-index-{Date}] ?


(Alexander Reelsen) #4

Hey Dirk,

soooooo.. you are querying across all of your indices, which means you will fire a query that covers a lot of shards - this can lead to an exception, where you (possibly together with queries that happen anyway on that cluster), exhaust the thread pool resources of certain nodes. A node can you only execute n queries in parallel and has an in-memory queue for put searches in. if both are full, then the above error message is returned.

an immediate workaround would be to use date math in index names.

It sounds as if it is sufficient to query the last two indices, if you have daily indices, so rolling over midnight works.

You can specify the indices like this: <test-index-{now/d}>,<test-index-{now/d-1d}>

hope this helps!


(Dirk L√ľneburger) #5

Hey spinscale,

awesome, works perfect with the date math :slight_smile:

just changing the watches right now, thanks for the fast help and have a nice day.

Cheers,
Dirk


(system) #6

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.