Machine learning can't detect changed pattern in log rate

Hello, I created a machine learning job using the API, I'm interested in high and low log rate for kubernetes containers, and we are trying to detect those unusual low/high log rate when they occur.

Here is the job ML definition:

PUT _ml/anomaly_detectors/foo-ml-monitor-logs
{
  "description": "Monitor unusual log rate, missing or high log rate",
  "groups": [
    "foo"
  ],
  "analysis_config": {
    "bucket_span": "15m",
    "detectors": [
      {
        "function": "high_count",
        "partition_field_name": "kubernetes.container.name.keyword",
        "detector_description": "high_count on partition field kubernetes.container.name.keyword"
      },
      {
        "function": "low_count",
        "partition_field_name": "kubernetes.container.name.keyword",
        "detector_description": "low_count partition_field_name=\"kubernetes.container.name.keyword\""
      }
    ],
    "influencers": []
  },
  "data_description": {
    "time_field": "@timestamp"
  },
  "model_plot_config": {
    "enabled": false,
    "annotations_enabled": true
  },
  "results_index_name": "foo-ml-monitor-logs",
  "analysis_limits": {
    "model_memory_limit": "13MB"
  }
}

after processing documents, here is the result of a container tagged with high severity:

We can see that we have a seasonality, log rate is low during night but we are interested in the last part (circle in red) where the pattern change, i.e no more seasonality and indeed we faced a log interruption during this time. How can I update the model to catch such a behavior please? is it possible with ML?

There's not quite enough information to tell what's transpired here, but certainly, the ML job should (very easily) catch this situation. Since you have a partition_field_name defined, your screenshot graph will be for a particular kubernetes.container.name.keyword...so...

It would be good to have you create another test job, but just for that kubernetes.container.name.keyword, then also set "enabled" : true for model_plot_config. Then, run the job only from April 1, 2022 up through the problematic period - and see that it looks like.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.