How to aggregate multiple events coming from different logs with slight different timestamp, when the only field is timestamp to combine those?

First of all, metric analysis (using functions like max, min, avg, etc) can handle the data being sparse in a bucket_span and even sparse between bucket_spans. In other words, if you have a detector with max(proecessing_time) and a 5-minute bucket_span, but data comes every 10 minutes, it'll still work.

But, given your sample data, I have two major questions:

  1. do you plan to split the data for each entity (something like host.name?)
  2. Why do you feel compelled putting these all in the same ML job? Why not have several jobs, one for each data "type"?