Can we use "sub-partitioning" in ML?

Vitaly_il · August 31, 2017, 12:54pm

Our goal is similar to discussed in this thread Machine learning - host stopped sending logs or events - we're trying to send alerts when certain hosts or components stopped to send data or we have anomaly in data volume.
There are about 5 indices, with about 30 log types, tens of hosts.
We created multi-count job, using "type" as partition_field_name. But what about per-host issues? Should we use "host" field as influencer or we can use "host" as a "sub-partion"?

Thank you,
Vitaly

richcollier · August 31, 2017, 1:58pm

Vitaly,

You could consider making an Advanced Job - with a config like:

function: low_count
by_field_name: type
partition_field_name: host

This will effectively give you a double-split - "for every host, look at the low count of data per type"

colby · August 31, 2017, 8:20pm

If you add the host name to a "tags" field then you can do an advanced job and just set the datafeed to query for that tag so that each job will essentially be a partition.

Vitaly_il · September 7, 2017, 3:01pm

Rich,
I'll totally understand if your answer will be "RTFM", but I'll ask anyway

what is the difference between using MultiMetric job wizard with partition_field_name="type.keyword" and using "host" field as influencer vs going to Advanced job and using by_field_name as you suggested.
I.e. will second option provide better granularity and/or will allow to create more usable alerts?
Does "low count" have some advantages vs just "count" for detect anomalies in number of events?
I.e. is ML job with "count" can replace two jobs with "low count" and "high count"?
Thank you again,
Vitaly

richcollier · September 7, 2017, 4:44pm

The multi-metric job cannot do a by_field_name and a partition_field_name. It only does partition_field_name. A choice of an influencer does not affect the statistical modeling (i.e. won't split the modeling) - possible influencers are determined after the anomaly is determined.
A count job is equivalent to low_count and high_count within the same job. You'd pick low_count if you were ONLY interested in a drop in events.

Vitaly_il · September 7, 2017, 6:20pm

thank you * 2!
Vitaly

system · October 5, 2017, 6:20pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
ML anomaly detection question Kibana elastic-stack-machine-learning	8	622	February 11, 2020
Sub partition in Machine Learning Elasticsearch elastic-stack-machine-learning	2	397	December 29, 2020
ELastic ML jobs Logs	2	279	November 30, 2023
ML partition by two fields Elasticsearch elastic-stack-machine-learning , painless	3	577	September 15, 2021
ML multi metric split filed only have keywork Kibana elastic-stack-machine-learning	13	760	September 10, 2021

Can we use "sub-partitioning" in ML?

Related topics