I was wondering if anyone has any ideas on how to achieve correlation detection between events and resources.
I have a visualization which shows the 10 top process by memory usage on a system. It also has annotations that show when new processes start on the system
Today, I use this visualization to manually identify correlations between processes starting and resource (memory) utilization of other processes to try an identify correlations between different processes.
I'd like to move this to more of a machine learning setup as trying to manually correlate thing over long periods with potentially hundreds of different dimensions isn't really scalable.
What I'd like to see is if something could detect frequent patterns between a set of events and historical data (metrics) to find correlations (and if possible, the strength of the correlations).
- If process A starts, and then frequently/consistently process B starts to increase memory usage shortly after, then I would like something to say: Process A starting strongly correlates to Process B's memory usage.
- If process C starts, but infrequently corresponds to process B memory increase, then I would like something to say: Process C weakly correlates or doesn't correlate at all to Process B's memory usage.
- I don't really care about things like: If process A starts and process A starts to increase memory usage. This type of thing would always be true/a causation and would provide little valuable insight.
- While I'm using process start times in my example, I'd like any sort of event to be usable; example: Query execution starts
- While I'm using process memory consumption in my example, I'd like any sort of historical metric to be usage. Examples: CPU, Memory (RAM), DiskIO
- A nice to have would be able to do cross correlation as well, does Memory usage increase of process A correlate to DiskIO of Process B increasing.
- Another nice to have would be sub-correlations (not really sure if this is the correct phrase).
- If consistently Process A starts and is shortly followed by Query Z, and then shortly followed by Process B using more memory, then Process A followed by Query Z strongly correlates to Process B's memory usage.
I recognize that this might be a fairly complex topic/thing to implement, and I wasn't really able to find an existing way to implement this in Elastic, so I'm not really sure if it's even possible.