Advanced ML job, transforming data




I am making an advanced ML job in X-pack that will be partitioned by a certain job-type. At the moment, the data in the field job-type that is in ES includes the version number as: "job_cats:release-4" (this is an example). If possible, I would like to trim all information after the colon for it to become "job_cats" and partition my ML job on the job_type without the version number (this is only useful for us to do for these ML jobs). Something similar to:

In short, how do I trim a string to include only information before the colon and then partition the field_name with the new trimmed value.

Thank you in advanced

(Ben Trent) #2

You totally can. You will have to create the job via the API and partition on the to be created scripted field.

Then you can create a datafeed that creates the scripted field (may want to do a check with _preview to verify your script is acting as it should).

And that should get you going. To my knowledge there is no way to do this strictly within the Kibana.


So i can't do this in the "edit JSON" tab in the advanced ML jobs in Kibana?

And can you please provide an example of to trim my example with painless?

(Ben Trent) #4

You can create the Job in the Edit Json tab, but I am not 100% you can create the data feed there. You will have to create/update the datafeed outside of that.

As for the scripting part, something like the following SHOULD work, though I am no painless expert.

String[] parts = /:/.split(doc[‘job_type’].value); return parts[0];

(Ben Trent) #5

Example JSON Bodies

Job Creation 

"description" : "Unusual service behaviour",
"analysis_config" : {
    "detectors" :[
        "detector_description": "Unusual sum for each  job",
        "function": "sum",
        "field_name": "total",
        "partition_field_name": "parsed_job_type"
        "detector_description": "Unusually high response for each job",
        "function": "high_mean",
        "field_name": "response",
        "partition_field_name": "parsed_job_type"
    "influencers": ["host"]
    "data_description" : {
      "time_format": "epoch_ms"


  "job_id" : "$JOB_ID",
  "indexes" : [
  "types" : [
  "scroll_size" : 1000,
  "script_fields": {
     "parsed_job_type": {
        "script": {
          "lang": "painless",
          "source": "String[] parts = /:/.split(doc[‘job_type’].value); return parts[0];" 


Thank you so much, you're awesome!

(Mark Walkom) #7