Advanced ML job, transforming data

machine-learning

#1

Hello,

I am making an advanced ML job in X-pack that will be partitioned by a certain job-type. At the moment, the data in the field job-type that is in ES includes the version number as: "job_cats:release-4" (this is an example). If possible, I would like to trim all information after the colon for it to become "job_cats" and partition my ML job on the job_type without the version number (this is only useful for us to do for these ML jobs). Something similar to:

https://www.elastic.co/guide/en/x-pack/current/ml-configuring-transform.html#ml-configuring-transform3

In short, how do I trim a string to include only information before the colon and then partition the field_name with the new trimmed value.

Thank you in advanced


(Ben Trent) #2

You totally can. You will have to create the job via the API and partition on the to be created scripted field.

Then you can create a datafeed that creates the scripted field (may want to do a check with _preview to verify your script is acting as it should).

And that should get you going. To my knowledge there is no way to do this strictly within the Kibana.


#3

So i can't do this in the "edit JSON" tab in the advanced ML jobs in Kibana?

And can you please provide an example of to trim my example with painless?


(Ben Trent) #4

You can create the Job in the Edit Json tab, but I am not 100% you can create the data feed there. You will have to create/update the datafeed outside of that.

As for the scripting part, something like the following SHOULD work, though I am no painless expert.

String[] parts = /:/.split(doc[‘job_type’].value); return parts[0];

(Ben Trent) #5

Example JSON Bodies

Job Creation 

{
"description" : "Unusual service behaviour",
"analysis_config" : {
    "bucket_span":"10m",
    "detectors" :[
      {
        "detector_description": "Unusual sum for each  job",
        "function": "sum",
        "field_name": "total",
        "partition_field_name": "parsed_job_type"
      },
      {
        "detector_description": "Unusually high response for each job",
        "function": "high_mean",
        "field_name": "response",
        "partition_field_name": "parsed_job_type"
      }
    ],
    "influencers": ["host"]
    },
    "data_description" : {
      "time_field":"@timestamp",
      "time_format": "epoch_ms"
    }
}

Data-feed-creation

{
  "job_id" : "$JOB_ID",
  "indexes" : [
    "$INDEX_NAME"
  ],
  "types" : [
    "metric"
  ],
  "scroll_size" : 1000,
  "script_fields": {
     "parsed_job_type": {
        "script": {
          "lang": "painless",
          "source": "String[] parts = /:/.split(doc[‘job_type’].value); return parts[0];" 
        }
     }
  }
}

#6

Thank you so much, you're awesome!


(Mark Walkom) #7