Watch looking for mounts disappearing per host

alerting

(Rob) #1

I have a watch that I copied and tweaked from another watch that I know works. The working watch is only looking at a DB and does not need to look at multiple hosts. The watch that I have for use is as follows:

 {
  "trigger": {
"schedule": {
  "interval": "4m"
}
  },
  "input": {
"search": {
  "request": {
    "search_type": "query_then_fetch",
    "indices": [
      "metricbeat-*"
    ],
    "types": [],
    "body": {
      "size": 50,
      "query": {
        "bool": {
          "filter": [
            {
              "range": {
                "@timestamp": {
                  "from": "now-5m",
                  "to": "now"
                }
              }
            }
          ],
          "must": [
            {
              "match": {
                "fields.team": "MyTeamName"
              }
            },
            {
              "exists": {
                "field": "system.filesystem.mount_point"
              }
            }
          ]
        }
      },
      "_source": {
        "excludes": []
      }
    }
  }
}
  },
  "condition": {
"compare": {
  "ctx.payload.hits.total": {
    "lt": 1
  }
}
  },

I have an action to send an email which should work. What I need from this watch that I am unable to figure out is to go through all of the hosts in MetricBeat that match the fields and alert (with the hostname) if any of the hosts do not have the "field": "system.filesystem.mount_point". I have looked into aggregations and am assuming that this is the route that I would have to go but do not understand it enough to get it to work. I have added the following between the body.size section and the query section but now my watch does not see to run even though it is active (I do not see any history for the watch even though it is active):

	  "aggs": {
		"per_host": {
			"aggs": {
				"per_minute":{
					"date_histogram":{
						"field": "@timestamp",
						"interval": "4m"
					},
				}
			},
			"terms": {
				"size": 100,
				"field": "beat.hostname"
				}
			}
		},

There are over 80 hosts and more are to be added as time goes on so doing an alert per host as the infrastructure team has done with the infrastructure would be unmanageable. Any info as to how to get this done will be helpful. Thanks for the assistance.


(Rob) #2

To update this, I believe that I now have the correct syntax but would like someone to review it to let me know if I am missing something. When I simulate it, I get back the hostnames along with the mount points and the number of documents related to it which is what we should see. I would like it to alert if the mount points are no longer available (0 documents) which I think the following will do. I am currently working with the team in charge of the environment to see if they can unmount the mount points for one of the test systems which will let me know if this works or not. Here is what I have so far:

{
  "trigger": {
    "schedule": {
      "interval": "4m"
    }
  },
  "input": {
    "search": {
      "request": {
        "search_type": "query_then_fetch",
        "indices": [
          "metricbeat-*"
        ],
        "types": [],
        "body": {
          "size": 50,
          "query": {
            "bool": {
              "must": [
                {
                  "match_all": {}
                },
                {
                  "match_phrase": {
                    "fields.team": {
                      "query": "MyTeamName"
                    }
                  }
                },
                {
                  "range": {
                    "@timestamp": {
                      "from": "now-5m",
                      "to": "now"
                    }
                  }
                }
              ],
              "must_not": []
            }
          },
          "_source": {
            "excludes": []
          },
          "aggs": {
            "2": {
              "terms": {
                "field": "beat.hostname",
                "size": 50,
                "order": {
                  "_count": "desc"
                }
              },
              "aggs": {
                "3": {
                  "terms": {
                    "field": "system.filesystem.mount_point",
                    "size": 5,
                    "order": {
                      "_count": "desc"
                    }
                  }
                }
              }
            }
          }
        }
      }
    }
  },
  "condition": {
    "compare": {
      "ctx.payload.hits.total": {
        "lt": 1
      }
    }
  },

any info will be of great use. Thank you for your time and assistance with regards to this.


(system) #3

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.