First of all, metric analysis (using functions like max
, min
, avg
, etc) can handle the data being sparse in a bucket_span
and even sparse between bucket_span
s. In other words, if you have a detector with max(proecessing_time)
and a 5-minute bucket_span
, but data comes every 10 minutes, it'll still work.
But, given your sample data, I have two major questions:
- do you plan to split the data for each entity (something like
host.name
?) - Why do you feel compelled putting these all in the same ML job? Why not have several jobs, one for each data "type"?