Machine Learning on scripted field?

I want to be able to run machine learning on an ingested field, but need to do a transformation to that field to make it useable. The transformation is the conversion of a score from a log scale to a linear scale.

I have set up a scripted field in Kibana called Linear_Score and I can see the calculated values in Discover. But when I try to set up a single metric job on the field, no preview of scores is shown. When I run the job, it spins for a minute, then stops after looking through 0 documents. There is no error message or anything else.

After some digging, this article


makes me think that I might need to make this script part of the elastic query itself. But, I can't figure out where in the json query I need to put the scripted field so it will be part of the learning.

I can't find any documentation on this and there are no helpful error messages. I can write my own scripted field in Dev Tools, but nothing is working when I add the code to the machine learning job JSON.

Hello,

Yes, creating a scripted_field must be part of the datafeed configuration (https://www.elastic.co/guide/en/x-pack/5.4/ml-put-datafeed.html).

To accomplish:

  1. Create an advanced job (only can do this technique on advanced jobs)
  2. Configure the job as you normally would. When creating detectors, instead of choosing an existing field, choose a name that we'll later assign to the script field. In this example, we are choosing the field name total_error_count which doesn't exist in our documents
  3. Once your job is configured as you like, go to the "Edit JSON" tab
  4. Append a new script_fields parameter inside the datafeed_config object. The syntax for script_fields is identical to that used by Elasticsearch. You can find more information on the syntax here. We'll add our total_error_count script field to the script_fields object. The script will do a simple addition of two fields in the document to produce a "total" error count:
  "datafeed_config": {
    "query": {
      "match_all": {}
    },
    "query_delay": "60s",
    "frequency": "150s",
    "scroll_size": 1000,
    "indexes": [
      "rally-2017"
    ],
    "types": [
      "metrics"
    ],
    "script_fields": {
      "total_error_count": {
        "script": {
          "lang": "painless",
          "inline": "doc['error_count'].value + doc['aborted_count'].value"
        }
      }
    }
  }

When done editing the JSON, you can verify the output of your script with the results with the "Data Preview" tab. When satisfied, press Save.

You'll notice that our detector referenced "total_error_count", which is generated at runtime by the script. Every time a document is loaded by Elasticsearch, the script is evaluated and its result outputted as a "virtual" field. This is then used by the ML job.

1 Like

Thank you! That is working now and I also hadn't come across the documentation you pointed me towards.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.