Machine learning anomaly correlation

0

Is there a way to achieve the following with elastic machine learning:

A sample demo structure:

Index data has the following fields: job_duration_time,server, boot_time, run_time

the first field:job_duration_time, is a summary of the last two: job_duration_time=boot_time+run_time

i would like to achieve:

  1. find anomalies in job_duration_time by server (i know how to implement: multi metric job checking median of job_duration_time splitted to server)
  2. find the root cause . meaning: find which of of the boot_time/run_time has correlation to the first anomaly.

example for such correlation:

Why not track all 3 metrics, per server in the job?

thanks for the reply @richcollier

If each data point represents a minute for example, theoretically- the anomaly of the first mertic, can be 1 minute after the anomaly of the second metric.
I'm not a statistician, but I think that you should use a formula to find correlation.(e.g.pearson correlation)

Pearson Correlation tells you how related (in a linear sense) two variables are on average. You need many observations (irrespective of time). This doesn't make sense in the context of time-series based data where "correlation" really means that something co-occurs in time. Keep in mind that we bucket the data in time (hence the meaningfulness of the bucket_span parameter).

In your case, you have 3 metrics where the 3rd is the sum of the first two, so naturally, you will get time-correlation (I disagree with your assertion that there is a 1 sample delay of anomalousness). Looking at the anomaly scores of the 3 time-series will allow you to infer causality. In other words, if metric3 = metric1 + metric2 then when metric3 is odd it is very likely that either metric1 and/or metric2 will also be odd. The scores of metric1 and metric2 are a proxy for which is the most responsible.

thanks @richcollier, i will try and let you know :slight_smile: