Can we use "sub-partitioning" in ML?


(Vitaly) #1

Our goal is similar to discussed in this thread Machine learning - host stopped sending logs or events - we're trying to send alerts when certain hosts or components stopped to send data or we have anomaly in data volume.
There are about 5 indices, with about 30 log types, tens of hosts.
We created multi-count job, using "type" as partition_field_name. But what about per-host issues? Should we use "host" field as influencer or we can use "host" as a "sub-partion"?

Thank you,

(rich collier) #2


You could consider making an Advanced Job - with a config like:

function: low_count
by_field_name: type
partition_field_name: host

This will effectively give you a double-split - "for every host, look at the low count of data per type"


If you add the host name to a "tags" field then you can do an advanced job and just set the datafeed to query for that tag so that each job will essentially be a partition.

(Vitaly) #4

I'll totally understand if your answer will be "RTFM", but I'll ask anyway

  1. what is the difference between using MultiMetric job wizard with partition_field_name="type.keyword" and using "host" field as influencer vs going to Advanced job and using by_field_name as you suggested.
    I.e. will second option provide better granularity and/or will allow to create more usable alerts?
  2. Does "low count" have some advantages vs just "count" for detect anomalies in number of events?
    I.e. is ML job with "count" can replace two jobs with "low count" and "high count"?
    Thank you again,

(rich collier) #5
  1. The multi-metric job cannot do a by_field_name and a partition_field_name. It only does partition_field_name. A choice of an influencer does not affect the statistical modeling (i.e. won't split the modeling) - possible influencers are determined after the anomaly is determined.

  2. A count job is equivalent to low_count and high_count within the same job. You'd pick low_count if you were ONLY interested in a drop in events.

(Vitaly) #6

thank you * 2!

(system) #7

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.