Good place to locate Kibana\Watcher expert?

Karn_Griffen · March 16, 2020, 5:21pm

I'm looking for someone to help us understand and create Watcher alerts for Metricbeat data coming in. Basic alerting such as netowrk, disk, cpu, memory.

Is there a place in the community for this? Or does anyone have suggestions on outside locations where I could look?

Thanks,
KG

pjanzen · March 16, 2020, 6:26pm

Assuming you have a licence why not create a support request. Thats what I do when I need help...

Karn_Griffen · March 16, 2020, 7:47pm

This was their reply: "It looks like you are asking for assistance with configuring Watcher. As you are a Cloud Standard customer, we are able to assist you with break/fix issues on your ESS cluster. This question appears to be outside of this break/fix scope."

spinscale · March 17, 2020, 1:41pm

I can't help you finding an expert/consultant, but if you start writing your down your issue in a bit more detail, and probably also check out the examples alerting repo to get up and running that might be a first step towards a possible solution and more understanding.

And of course, there are Elastic commercial offerings like subscriptions, but I am not super sure, if you are after that.

Hope that helps!

Karn_Griffen · March 17, 2020, 5:02pm

I will post my Watch here. This is running on Elastic Cloud, and I used the repo to get me started.

The purpose of this Watch was to alert when any disk volume is above 80%. It does not fire:

    {
      "trigger": {
    "schedule": {
      "interval": "5m"
    }
      },
      "input": {
    "search": {
      "request": {
        "search_type": "query_then_fetch",
        "indices": [
          "metricbeat-*"
        ],
        "types": [
          "filesystem"
        ],
        "rest_total_hits_as_int": true,
        "body": {
          "aggs": {
            "host": {
              "terms": {
                "field": "host.hostname",
                "order": {
                  "disk_usage": "desc"
                }
              },
              "aggs": {
                "disk_usage": {
                  "max": {
                    "field": "system.filesystem.used.pct"
                  }
                }
              }
            }
          },
          "query": {
            "bool": {
              "filter": [
                {
                  "range": {
                    "@timestamp": {
                      "gte": "now-{{ctx.metadata.window_period}}"
                    }
                  }
                },
                {
                  "range": {
                    "disk_usage": {
                      "gte": "{{ctx.metadata.threshold}}"
                    }
                  }
                }
              ]
            }
          }
        }
      }
    }
      },
      "condition": {
    "compare": {
      "ctx.payload.hits.total": {
        "gt": 0
      }
    }
      },
      "actions": {
    "email_me": {
      "throttle_period_in_millis": 60000,
      "email": {
        "profile": "standard",
        "from": "username@example.org",
        "to": [
          "myemail@myemail.com"
        ],
        "subject": "Disk Full",
        "body": {
          "html": "Some hosts are over {{ctx.payload.threshold}}% utilized:{{#ctx.payload.hosts}}{{disk_usage}}%-{{key}}:{{/ctx.payload.hosts}}"
        }
      }
    },
    "log": {
      "logging": {
        "level": "info",
        "text": "Some hosts are over {{ctx.payload.threshold}}% utilized:{{#ctx.payload.hosts}}{{disk_usage}}%-{{key}}:{{/ctx.payload.hosts}}"
      }
    }
      },
      "metadata": {
    "window_period": "15m",
    "threshold": 0.8
      },
      "transform": {
    "search": {
      "request": {
        "search_type": "query_then_fetch",
        "indices": [
          "log-events"
        ],
        "rest_total_hits_as_int": true,
        "body": {
          "query": {
            "match": {
              "status": "error"
            }
          }
        }
      }
    }
      }
    }

spinscale · March 17, 2020, 5:15pm

The output of the Execute Watch API would help a lot here.

Also this blog post I wrote a few years back is still valid and should help you to get into a fast write/debug loop when dealing with watches allowing you to easily figure out when something is wrong.

Karn_Griffen · March 17, 2020, 5:43pm

Thanks for that, reading through your blog post now. Here is the result from execute API:

#! Deprecation: [types removal] Specifying types in a watcher search request is deprecated.
{
  "_id" : "531b40c2-4e64-4a83-9eb4-989cd465f474_d31688ac-7e56-40ab-a5a8-07481c5b318f-2020-03-17T17:40:52.556865Z",
  "watch_record" : {
    "watch_id" : "531b40c2-4e64-4a83-9eb4-989cd465f474",
    "node" : "w88m1bFtSryU64LkC72IdA",
    "state" : "execution_not_needed",
    "user" : "elastic",
    "status" : {
      "state" : {
        "active" : true,
        "timestamp" : "2020-03-09T17:38:15.946Z"
      },
      "last_checked" : "2020-03-17T17:40:52.556Z",
      "actions" : {
        "email_me" : {
          "ack" : {
            "timestamp" : "2020-03-09T17:38:15.946Z",
            "state" : "awaits_successful_execution"
          }
        },
        "log" : {
          "ack" : {
            "timestamp" : "2020-03-09T17:38:15.946Z",
            "state" : "awaits_successful_execution"
          }
        }
      },
      "execution_state" : "execution_not_needed",
      "version" : 2354
    },
    "trigger_event" : {
      "type" : "manual",
      "triggered_time" : "2020-03-17T17:40:52.556Z",
      "manual" : {
        "schedule" : {
          "scheduled_time" : "2020-03-17T17:40:52.556Z"
        }
      }
    },
    "input" : {
      "search" : {
        "request" : {
          "search_type" : "query_then_fetch",
          "indices" : [
            "metricbeat-*"
          ],
          "types" : [
            "filesystem"
          ],
          "rest_total_hits_as_int" : true,
          "body" : {
            "aggs" : {
              "host" : {
                "terms" : {
                  "field" : "host.hostname",
                  "order" : {
                    "disk_usage" : "desc"
                  }
                },
                "aggs" : {
                  "disk_usage" : {
                    "max" : {
                      "field" : "system.filesystem.used.pct"
                    }
                  }
                }
              }
            },
            "query" : {
              "bool" : {
                "filter" : [
                  {
                    "range" : {
                      "@timestamp" : {
                        "gte" : "now-{{ctx.metadata.window_period}}"
                      }
                    }
                  },
                  {
                    "range" : {
                      "disk_usage" : {
                        "gte" : "{{ctx.metadata.threshold}}"
                      }
                    }
                  }
                ]
              }
            }
          }
        }
      }
    },
    "condition" : {
      "compare" : {
        "ctx.payload.hits.total" : {
          "gt" : 0
        }
      }
    },
    "metadata" : {
      "window_period" : "15m",
      "name" : "Disk Used Test - Karn",
      "threshold" : 0.8,
      "xpack" : {
        "type" : "json"
      }
    },
    "result" : {
      "execution_time" : "2020-03-17T17:40:52.556Z",
      "execution_duration" : 21,
      "input" : {
        "type" : "search",
        "status" : "success",
        "payload" : {
          "_shards" : {
            "total" : 46,
            "failed" : 0,
            "successful" : 46,
            "skipped" : 0
          },
          "hits" : {
            "hits" : [ ],
            "total" : 0,
            "max_score" : null
          },
          "took" : 20,
          "timed_out" : false,
          "aggregations" : {
            "host" : {
              "doc_count_error_upper_bound" : 0,
              "sum_other_doc_count" : 0,
              "buckets" : [ ]
            }
          }
        },
        "search" : {
          "request" : {
            "search_type" : "query_then_fetch",
            "indices" : [
              "metricbeat-*"
            ],
            "types" : [
              "filesystem"
            ],
            "rest_total_hits_as_int" : true,
            "body" : {
              "aggs" : {
                "host" : {
                  "terms" : {
                    "field" : "host.hostname",
                    "order" : {
                      "disk_usage" : "desc"
                    }
                  },
                  "aggs" : {
                    "disk_usage" : {
                      "max" : {
                        "field" : "system.filesystem.used.pct"
                      }
                    }
                  }
                }
              },
              "query" : {
                "bool" : {
                  "filter" : [
                    {
                      "range" : {
                        "@timestamp" : {
                          "gte" : "now-15m"
                        }
                      }
                    },
                    {
                      "range" : {
                        "disk_usage" : {
                          "gte" : "0.8"
                        }
                      }
                    }
                  ]
                }
              }
            }
          }
        }
      },
      "condition" : {
        "type" : "compare",
        "status" : "success",
        "met" : false,
        "compare" : {
          "resolved_values" : {
            "ctx.payload.hits.total" : 0
          }
        }
      },
      "actions" : [ ]
    },
    "messages" : [ ]
  }
}

spinscale · March 18, 2020, 2:28pm

So, the interesting part here is result.input.payload which contains the search response. That one shows that no hits have been found (hits.total is 0). This means the condition is false and thus nothing is triggered.

Have you tried extracting the query from the watch and rewrite it to match documents? Depending on the ES version there is no need for types anymore for example.

Karn_Griffen · March 18, 2020, 6:59pm

Alexander,

Thank you for all of your help. I spent a few hours going between your blog and my watch. I managed to re-craft the watch completely, and your advice on how to speed up testing and dev was a godsend. For those coming along behind, or also for any critique or comments, here is my watch now, which works correctly:

{
  "trigger": {
    "schedule": {
      "interval": "12h"
    }
  },
  "input": {
    "search": {
      "request": {
        "search_type": "query_then_fetch",
        "indices": [
          "metricbeat-*"
        ],
        "rest_total_hits_as_int": true,
        "body": {
          "size": 0,
          "query": {
            "bool": {
              "filter": [
                {
                  "range": {
                    "@timestamp": {
                      "from": "now-15m"
                    }
                  }
                },
                {
                  "range": {
                    "system.filesystem.used.pct": {
                      "from": 0.85
                    }
                  }
                }
              ],
              "must": [
                {
                  "match_phrase": {
                    "system.filesystem.mount_point": "/cmdb"
                  }
                }
              ]
            }
          },
          "aggs": {
            "by_host": {
              "terms": {
                "field": "host.hostname",
                "size": "100"
              }
            },
            "by_disk": {
              "terms": {
                "field": "system.filesystem.mount_point",
                "size": "100"
              }
            }
          }
        }
      }
    }
  },
  "condition": {
    "compare": {
      "ctx.payload.hits.total": {
        "gt": 0
      }
    }
  },
  "actions": {
    "email_1": {
      "email": {
        "profile": "standard",
        "to": [
          "nobody@nowhere.com"
        ],
        "subject": "CMDB is above 85% on | {{#ctx.payload.aggregations.by_host.buckets}}{{key}} |{{/ctx.payload.aggregations.by_host.buckets}}.",
        "body": {
          "text": "CMDB is above 85% on | {{#ctx.payload.aggregations.by_host.buckets}}{{key}} |{{/ctx.payload.aggregations.by_host.buckets}}."
        }
      }
    }
  }
}

spinscale · March 19, 2020, 2:25pm

Glad you got it working! Thanks for digging through all of this!

system · April 16, 2020, 2:26pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Watcher Alerts from MetricBeat metrics Beats metricbeat	3	2367	July 14, 2018
Help with query from Metricbeat index Elasticsearch	1	352	May 23, 2018
How to create watcher alert for data or documents are not received on metricbeat or filebeat Elasticsearch elastic-stack-alerting	4	769	February 7, 2020
Learning to create watches Elasticsearch	4	865	March 20, 2017
Watcher repository by metricbeat module (kubernetes, system) Beats metricbeat	4	372	November 5, 2019

Good place to locate Kibana\Watcher expert?

Related topics