I'm trying to set up a ML job that can track the servers that we have deployed metricbeat/packetbeat to and get alerted when one stops reporting. From what I understand of the ML functions is that many of them use numbers and not strings for the job. In our case were looking for the total number of beats.name or Number of Hosts [Metricbeat System] so using a count aggregation and then trying to use the Field of beat.name doesnt work since the server names are in string format, although I could very easily be misunderstanding this. For the visualization Number of Hosts [Metricbeat System], beat.name is being used but its aggregating all of them to produce a number using what appears to be the cardinalty aggregation. Overall we're trying to be alerted to the fact that one of our servers has stopped reporting and which one it is. That is the end goal and any help would be greatly appreciated. Also there are probably other ways to do this so feel free to point me in another direction.
My thought process so far has led me down the road of potentaially creating an array within the "field_name" value ("field_name" : "[]") that would produce a total number for us to work off of. ML would then be able to use the low_count function, which should stay constant at 96 for us, to give us a heads up when the number decreases and won't alert us as we add more servers. Although this still wont help us figure out which server isn't working from what I can tell.
PUT _xpack/ml/anomaly_detectors/metricbeat_monitoring
{
"analysis_config": {
"detectors": [{
"function" : "low_count",
"field_name" : "["cardinality": {
"field": "beat.name"} //This doesnt work obviously
bus something like this???//
]"
}]
},
"data_description": {
"time_field":"timestamp",
"time_format": "epoch_ms"
}
}
I've also been looking at Datafeeds and trying to figure out if setting up anything like that would work. Maybe even using the Watch API?
Below is the result JSON of the Number of Hosts [Metricbeat] from what I've found.
{
"size": 0,
"_source": {
"excludes": []
},
"aggs": {
"1": {
"cardinality": {
"field": "beat.name"
}
}
},
"version": true,
"stored_fields": [
""
],
"script_fields": {},
"docvalue_fields": [
"@timestamp",
"ceph.monitor_health.last_updated",
"docker.container.created",
"docker.healthcheck.event.end_date",
"docker.healthcheck.event.start_date",
"docker.image.created",
"kubernetes.container.start_time",
"kubernetes.event.metadata.timestamp.created",
"kubernetes.node.start_time",
"kubernetes.pod.start_time",
"kubernetes.system.start_time",
"mongodb.status.background_flushing.last_finished",
"mongodb.status.local_time",
"php_fpm.pool.start_time",
"postgresql.activity.backend_start",
"postgresql.activity.query_start",
"postgresql.activity.state_change",
"postgresql.activity.transaction_start",
"postgresql.bgwriter.stats_reset",
"postgresql.database.stats_reset",
"system.process.cpu.start_time"
],
"query": {
"bool": {
"must": [
{
"query_string": {
"query": "",
"analyze_wildcard": true,
"default_field": ""
}
},
{
"query_string": {
"analyze_wildcard": true,
"default_field": "",
"query": ""
}
},
{
"range": {
"@timestamp": {
"gte": 1537896549877,
"lte": 1537897449877,
"format": "epoch_millis"
}
}
}
],
"filter": [],
"should": [],
"must_not": []
}
},
"highlight": {
"pre_tags": [
"@kibana-highlighted-field@"
],
"post_tags": [
"@/kibana-highlighted-field@"
],
"fields": {
"": {}
},
"fragment_size": 2147483647
}
}