ML partition by two fields

Hi,

ES v 7.13.2

I'm trying to create the following ML job and not sure if I'm doing it correctly.

indices: user_events*
every document looks something like:

{
    "date_time": "2021-07-16T01:00:00.000Z",
    "clientId": 123,
    "domainsGroup": "OrgName1",
    "event_key": "add_to_cart",
    "event": "Add to cart",
    "ok": false/true
}

I want to detect if there are an increase or decrease in the number of failed goals (ok=false) - (per domainsGroup and event). In other words, I want to split the data by domainsGroup_event and then count per hour, how many i.e. failed Add to carts OrgName1 had - if there was a peak in an hour, I want to do some actions..

I tried in many ways, nothing worked, but specificly, can't understand why the following doesn't work.. ?
note: every document must have all the field mentioned above, and event, event_key, domainsGroup and clientId are all keywords.

running this job gives me this error: "Datafeed is encountering errors extracting data: runtime error"

PUT _ml/anomaly_detectors/hourly_goal_failures_anomalies_job_v1
{
  "description" : "goal failures anomalies job",
  "analysis_config": {
    "bucket_span": "1h",
    "detectors": [
      {
        "function": "count",
        "partition_field_name": "domainsGroup_event",
        "detector_description": "goal failures count"
      }
    ],
    "influencers": [
      "domainsGroup",
      "clientId",
      "event"
    ]
  },
  "analysis_limits": {
    "model_memory_limit": "500MB"
  },
  "data_description": {
    "time_field": "date_time",
    "time_format": "epoch_ms"
  },
  "model_snapshot_retention_days": 10,
  "daily_model_snapshot_retention_after_days": 1,
  "results_index_name": "",
  "allow_lazy_open": false,
  "groups": []
}


PUT _ml/datafeeds/datafeed-hourly_goal_failures_anomalies_job_v1
{
  "query_delay": "5m",
  "query": {
    "bool": {
      "filter": [
        {
          "term": {
            "ok": false
          }
        }
      ]
    }
  },
  "indices": [
    "user_events*"
  ],
  "script_fields": {
    "domainsGroup_event": {
      "script": {
        "source": "doc['domainsGroup'].value + doc['event'].value",
        "lang": "painless"
      },
      "ignore_failure": true
    }
  },
  "scroll_size": 1000,
  "delayed_data_check_config": {
    "enabled": true
  },
  "job_id": "hourly_goal_failures_anomalies_job_v1",
  "datafeed_id": "datafeed-hourly_goal_failures_anomalies_job_v1"
}

What do you see if you run datafeed preview?

GET _ml/datafeeds/datafeed-hourly_goal_failures_anomalies_job_v1/_preview

What's the mappings for your index?

GET user_events*/_mapping

You should also know you can get a double split naturally by using both the partition_field_name and the by_field_name - no need to artificially create a script_field

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.