Elasticsearch machine learning datafeed configuration error

zerojin63 · May 24, 2022, 9:54am

I'm trying to alter datafeed for ml jobs but the same errror keep occurs.
What I want to do is alter data like 'abc/def/ghi' into 'abc' so based on this i wrote script as follows

  "runtime_mappings": {
    "my_runtime_field": {
      "type": "keyword",
      "script": {
        "source": "def m = /^\/(\w+|_)\//.matcher(doc['tokenstring3'].value); emit(m.find() ? m.group(1): '');"
      }
    }
  }

But every time I try to write any escape letter, automatically two sets of quotation marks are added around my code, making it like this.

      "script": {
        "source": """def m = /^\/(\w+|_)\//.matcher(doc['tokenstring3'].value); emit(m.find() ? m.group(1): ''"");

And when i delete the quotation marks, the screen goes black.

Does anyone know how to fix this error?

Peter_Harverson · May 25, 2022, 12:11pm

@zerojin63 I have been able to configure a runtime field similar to the one you are trying to use in your datafeed. The double quotes at the end of the second snippet you posted just need editing.

I am using the Advanced wizard in the Machine Learning page in Kibana, and then edit the JSON used in the datafeed to this for the data view (index pattern) I am using:

"runtime_mappings": {
    "uri_path_first": {
      "type": "keyword",
      "script": {
        "source": """def m = /^\/(\w+|_)\//.matcher(doc['uri'].value); emit(m.find() ? m.group(1): '');"""
      }
    }
  }

You can see in the datafeed preview that the runtime field is getting populated (most of my entries are empty, but there are non-empty ones too), and I can use the field in a job:

Note we have an issue open to switch to a new JSON editor throughout the Machine Learning UI, which would allow you to define the script using the same syntax you posted in your first snippet. Depending on which version of the stack you have, you can also define this field in the data view (index pattern) itself, using that simpler syntax, and it will be available to use in the anomaly detection job.

Hope this works for you.
Pete

zerojin63 · May 26, 2022, 5:05am

The first one (datafeed configuration) still didn't work but the second one did!

This is the code I used.

if (doc["field_name.keyword"].size()!=0){
    def path = doc["field_name.keyword"].value;
    if (path != null) {
        int firstSlashIndex = path.indexOf('/');
        if (firstSlashIndex > 0) {
            emit(path.substring(0,firstSlashIndex));
        return;
        }
    }
}
emit("");

Thank you so much for your help

zerojin63 · June 3, 2022, 5:05am

I've run into another problem..

An error occurs on job validation process when I use the runtime field I added as a detector field on anomaly detection job.

This is the error that keeps occuring.

I've checked that the runtime field I added works fine on 'Discover' section.
I've also checked that if I use other fields (the original fields, except for the one I added) as detector, no error occurs.
So it must have something to do with rhe field I added.. But I don't get what's wrong with it.
Do you happen to have any idea regarding this situation?

Peter_Harverson · June 16, 2022, 1:17pm

Apologies for the delay in replying @zerojin63. The error you are seeing when trying to use the field in an ML job looks like it is being caused by the datafeed preview step which validates the job configuration taking too long. The fact that the field shows up in Discover would imply that the configuration of the field is ok.

What version of the stack do you have, and which of the anomaly detection job wizards are you using? If you use the advanced job wizard, add the detector using the runtime field, then hit 'Edit JSON' and take a look in the datafeeed preview panel. Do you see values for your runtime field in there, like I see here:

You could also create the job directly in Kibana Dev Tools, and try running the datafeed preview from there. For example, with my config:

PUT _ml/anomaly_detectors/test1
{
  "analysis_config": {
    "bucket_span": "4h",
    "detectors": [
      {
        "function": "sum",
        "field_name": "bytes",
        "partition_field_name": "uri_first_path",
        "detector_description": "sum(bytes) partitionfield=uri_first_path"
      }
    ],
    "influencers": [
      "uri_first_path"
    ]
  },
  "data_description": {
    "time_field":"@timestamp"
  },
    "datafeed_config":{
    "datafeed_id": "datafeed-test1",
    "indices": ["gallery-*"],
    "runtime_mappings": {
      "uri_first_path": {
        "type": "keyword",
        "script": {
          "source": """def m = /^\/(\w+|_)\//.matcher(doc['uri'].value); emit(m.find() ? m.group(1): '');"""
        }
      }
    }
  }
}

GET _ml/datafeeds/datafeed-test1/_preview

zerojin63 · June 17, 2022, 12:26am

Thanks for your reply.
Here's the answer to your questions.

What version of the stack do you have?
-> I'm using v 7.17.3
Which of the anomaly detection job wizatd I am using?
-> I used advanced job as well. Also I was able to see values for my runtime field.

So I don't think there's anything wrong with my runtime field script, but still the error occurs on job validation process.
I'm using 'rare' as my anomaly detection job detector. Can this be the reason for this error?

Peter_Harverson · June 17, 2022, 10:35am

You shouldn't have any problems using the runtime field in a 'rare' detector. I have stepped through the advanced wizard and the validation step succeeded. My job and datafeed config and preview look like this (I have two runtime fields defined in my data view, but the job is only using one of them):

Are you able to share your job and datafeed config?

Did you try creating the job and running the datafeed preview inside Kibana Dev Tools as shown above?

system · July 15, 2022, 10:36am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
ML Job on Scripted field Elasticsearch elastic-stack-machine-learning	22	3711	March 19, 2018
Machine Learning on scripted field? Elasticsearch elastic-stack-machine-learning	3	2314	July 10, 2017
Can’t merge a non object mapping with an object mapping error in machine learning(beta) module Elasticsearch	10	18449	August 18, 2017
6.2.2 cannot run job: datafeed [XXX] cannot retrieve field [@timestamp] because it has no mappings Elasticsearch elastic-stack-machine-learning	3	1308	October 30, 2018
[illegal_state_exception] value source config is invalid; must have either a field context or a script or marked as unwrapped Kibana	5	10524	August 12, 2017

Elasticsearch machine learning datafeed configuration error

Related topics