ML partition by two fields

OmarDacca · August 18, 2021, 2:15pm

Hi,

ES v 7.13.2

I'm trying to create the following ML job and not sure if I'm doing it correctly.

indices: user_events*
every document looks something like:

{
    "date_time": "2021-07-16T01:00:00.000Z",
    "clientId": 123,
    "domainsGroup": "OrgName1",
    "event_key": "add_to_cart",
    "event": "Add to cart",
    "ok": false/true
}

I want to detect if there are an increase or decrease in the number of failed goals (ok=false) - (per domainsGroup and event). In other words, I want to split the data by domainsGroup_event and then count per hour, how many i.e. failed Add to carts OrgName1 had - if there was a peak in an hour, I want to do some actions..

I tried in many ways, nothing worked, but specificly, can't understand why the following doesn't work.. ?
note: every document must have all the field mentioned above, and event, event_key, domainsGroup and clientId are all keywords.

running this job gives me this error: "Datafeed is encountering errors extracting data: runtime error"

PUT _ml/anomaly_detectors/hourly_goal_failures_anomalies_job_v1
{
  "description" : "goal failures anomalies job",
  "analysis_config": {
    "bucket_span": "1h",
    "detectors": [
      {
        "function": "count",
        "partition_field_name": "domainsGroup_event",
        "detector_description": "goal failures count"
      }
    ],
    "influencers": [
      "domainsGroup",
      "clientId",
      "event"
    ]
  },
  "analysis_limits": {
    "model_memory_limit": "500MB"
  },
  "data_description": {
    "time_field": "date_time",
    "time_format": "epoch_ms"
  },
  "model_snapshot_retention_days": 10,
  "daily_model_snapshot_retention_after_days": 1,
  "results_index_name": "",
  "allow_lazy_open": false,
  "groups": []
}


PUT _ml/datafeeds/datafeed-hourly_goal_failures_anomalies_job_v1
{
  "query_delay": "5m",
  "query": {
    "bool": {
      "filter": [
        {
          "term": {
            "ok": false
          }
        }
      ]
    }
  },
  "indices": [
    "user_events*"
  ],
  "script_fields": {
    "domainsGroup_event": {
      "script": {
        "source": "doc['domainsGroup'].value + doc['event'].value",
        "lang": "painless"
      },
      "ignore_failure": true
    }
  },
  "scroll_size": 1000,
  "delayed_data_check_config": {
    "enabled": true
  },
  "job_id": "hourly_goal_failures_anomalies_job_v1",
  "datafeed_id": "datafeed-hourly_goal_failures_anomalies_job_v1"
}

richcollier · August 18, 2021, 4:31pm

What do you see if you run datafeed preview?

GET _ml/datafeeds/datafeed-hourly_goal_failures_anomalies_job_v1/_preview

What's the mappings for your index?

GET user_events*/_mapping

richcollier · August 18, 2021, 4:32pm

You should also know you can get a double split naturally by using both the partition_field_name and the by_field_name - no need to artificially create a script_field

system · September 15, 2021, 4:32pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Sub partition in Machine Learning Elasticsearch elastic-stack-machine-learning	2	397	December 29, 2020
Question on how to create a simple ML job Elasticsearch elastic-stack-machine-learning	12	1103	October 29, 2018
Can you set partition field and count by as the same field? Kibana elastic-stack-machine-learning	3	411	December 14, 2022
Can we use "sub-partitioning" in ML? Elasticsearch elastic-stack-machine-learning	6	1302	October 5, 2017
ML Multi-Metric query fails when similar Single-Metric is OK Elasticsearch elastic-stack-machine-learning	14	1632	October 8, 2017

ML partition by two fields

Related topics