Hi,
I'm trying to create a datafeed for a ML job which would detect suspicious egress traffic per hosts.
PUT _xpack/ml/datafeeds/datafeed-network_out_deriv/
{
"job_id": "network_out_deriv",
"indices": [
"metricbeat-*"
],
"query": {
"bool": {
"filter": [
{
"term": {
"host.name": "qa-control"
}
}
],
"must": {
"exists": {
"field": "system.network.out.bytes"
}
}
}
},
"aggregations": {
"buckets": {
"date_histogram": {
"field": "@timestamp",
"interval": "10s",
"time_zone": "UTC"
},
"aggregations": {
"@timestamp": {
"max": {
"field": "@timestamp"
}
},
"host.name": {
"terms": {
"field": "host.name"
}
},
"network_out": {
"max": {
"field": "system.network.out.bytes"
}
},
"network_out_deriv": {
"derivative": {
"buckets_path": "network_out"
}
}
}
}
}
}
The above inserted datafeed is working but as you can see the hostname is hardcoded there. The problem is that I dont know how to create another aggeregation for hostnames (and partition the ML job by that) while I keep using the derivative function. As soon as I add another aggeregation the derivative function stops working, see below:
GET /_search
{
"aggregations": {
"buckets": {
"date_histogram": {
"field": "@timestamp",
"interval": "1000m",
"time_zone": "UTC"
},
"aggregations": {
"@timestamp": {
"max": {
"field": "@timestamp"
}
},
"aggs": {
"terms": {
"field": "host.name"
},
"aggs": {
"network_out": {
"max": {
"field": "system.network.out.bytes"
}
},
"network_out_deriv": {
"derivative": {
"buckets_path": "network_out"
}
}
}
}
}
}
}
}
Receiving this error:
"type" : "illegal_state_exception",
"reason" : "derivative aggregation [network_out_deriv] must have a histogram, date_histogram or auto_date_histogram as parent"
If anybody could give me some tips how to proceed I would really appreciate