Advanced ML job, transforming data

palace · August 1, 2018, 7:51pm

Hello,

I am making an advanced ML job in X-pack that will be partitioned by a certain job-type. At the moment, the data in the field job-type that is in ES includes the version number as: "job_cats:release-4" (this is an example). If possible, I would like to trim all information after the colon for it to become "job_cats" and partition my ML job on the job_type without the version number (this is only useful for us to do for these ML jobs). Something similar to:

https://www.elastic.co/guide/en/x-pack/current/ml-configuring-transform.html#ml-configuring-transform3

In short, how do I trim a string to include only information before the colon and then partition the field_name with the new trimmed value.

Thank you in advanced

BenTrent · August 1, 2018, 8:55pm

You totally can. You will have to create the job via the API and partition on the to be created scripted field.

Then you can create a datafeed that creates the scripted field (may want to do a check with _preview to verify your script is acting as it should).

And that should get you going. To my knowledge there is no way to do this strictly within the Kibana.

palace · August 1, 2018, 9:16pm

So i can't do this in the "edit JSON" tab in the advanced ML jobs in Kibana?

And can you please provide an example of to trim my example with painless?

BenTrent · August 1, 2018, 9:26pm

You can create the Job in the Edit Json tab, but I am not 100% you can create the data feed there. You will have to create/update the datafeed outside of that.

As for the scripting part, something like the following SHOULD work, though I am no painless expert.

String[] parts = /:/.split(doc[‘job_type’].value); return parts[0];

BenTrent · August 2, 2018, 2:18pm

Example JSON Bodies

Job Creation 

{
"description" : "Unusual service behaviour",
"analysis_config" : {
    "bucket_span":"10m",
    "detectors" :[
      {
        "detector_description": "Unusual sum for each  job",
        "function": "sum",
        "field_name": "total",
        "partition_field_name": "parsed_job_type"
      },
      {
        "detector_description": "Unusually high response for each job",
        "function": "high_mean",
        "field_name": "response",
        "partition_field_name": "parsed_job_type"
      }
    ],
    "influencers": ["host"]
    },
    "data_description" : {
      "time_field":"@timestamp",
      "time_format": "epoch_ms"
    }
}

Data-feed-creation

{
  "job_id" : "$JOB_ID",
  "indexes" : [
    "$INDEX_NAME"
  ],
  "types" : [
    "metric"
  ],
  "scroll_size" : 1000,
  "script_fields": {
     "parsed_job_type": {
        "script": {
          "lang": "painless",
          "source": "String[] parts = /:/.split(doc[‘job_type’].value); return parts[0];" 
        }
     }
  }
}

palace · August 2, 2018, 4:16pm

Thank you so much, you're awesome!

Topic		Replies	Views
ML Job on Scripted field Elasticsearch elastic-stack-machine-learning	22	3754	March 19, 2018
Machine Learning on scripted field? Elasticsearch elastic-stack-machine-learning	3	2326	July 10, 2017
Sub partition in Machine Learning Elasticsearch elastic-stack-machine-learning	2	424	December 29, 2020
ML partition by two fields Elasticsearch elastic-stack-machine-learning , painless	3	621	September 15, 2021
Question on how to create a simple ML job Elasticsearch elastic-stack-machine-learning	12	1279	October 29, 2018

Advanced ML job, transforming data

Related topics