Create jobs with field combinations

Bharath_Kumar_R · June 18, 2019, 1:50pm

Hey, I have 2 fields: field1 and field2 in my data. Right now, Im filtering the data for some combinations of field1 and field2 and creating the jobs for those saved searches. What modifications/configurations are to be made such that my job automatically filters data for every combination of field1 and field2 and creates models for every such combination? Is it possible with multi metric job(or any other way) or must be implemented via any language client?

richcollier · June 18, 2019, 5:55pm

One possibility would be to dynamically create a script_field that is the concatenation of field1 and field2:

PUT _xpack/ml/anomaly_detectors/my_job
{
    "analysis_config": {
        "bucket_span": "1h",
        "detectors": [{
            "detector_description": "count per method_status",
            "function": "count",
            "partition_field_name": "method_status"
        }],
        "influencers": ["method", "status"]
    },
    "data_description": {
        "time_field": "@timestamp"
    }
}

PUT _xpack/ml/datafeeds/datafeed-my_job/
{
  "job_id": "my_job",
  "indices": [
    "gallery-*"
  ],
      "query": {
        "match_all": {
        }
      },
      "script_fields": {
        "method_status": {
          "script": {
            "source": "doc['method'].value + '_' + doc['status'].value",
            "lang": "painless"
          },
          "ignore_failure": false
        }
      }

}

GET _xpack/ml/datafeeds/datafeed-my_job/_preview/

...
 {
    "@timestamp" : 1483244920000,
    "method" : "POST",
    "method_status" : "POST_200",
    "status" : "200"
  },
  {
    "@timestamp" : 1483244949000,
    "method" : "GET",
    "method_status" : "GET_200",
    "status" : "200"
  },
  {
    "@timestamp" : 1483245000000,
    "method" : "GET",
    "method_status" : "GET_200",
    "status" : "200"
  },
...

Result:

Bharath_Kumar_R · June 19, 2019, 10:10am

I tried to create a job with the foll. request:

PUT /_ml/anomaly_detectors/my_job
{
"analysis_config": {
"bucket_span": "15m",
"detectors": [{
"detector_description": "count per method_status",
"function": "low_count"
}],
"influencers": ["SHIPPERID", "CARRIERID"]
},
"data_description": {
"time_field": "EVENTTIME"
}
}

But I got an error as follows:

{
"error": {
"root_cause": [
{
"type": "status_exception",
"reason": "This job would cause a mapping clash with existing field [CARRIERID] - avoid the clash by assigning a dedicated results index"
}
],
"type": "status_exception",
"reason": "This job would cause a mapping clash with existing field [CARRIERID] - avoid the clash by assigning a dedicated results index",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "Can't merge a non object mapping [CARRIERID] with an object mapping [CARRIERID]"
}
},
"status": 400
}

Can you explain what is the cause for the error and how to resolve it?

richcollier · June 19, 2019, 10:28am

The destination index for the results of your jobs is a shared index called .ml-anomalies-shared. There apparently is already a field within that index (from some other job you've run apparently) with the name CARRIERID. This field has a mapping (assignment to a data type) that is different than the CARRIERID field mapping type from your new job. A single index cannot have two fields with the same name with different mapping types.

To avoid this, add the following to make a dedicated new results index just for that job:

  "results_index_name": "mynewresultsindexname"

for example:

PUT _xpack/ml/anomaly_detectors/my_job
{
    "analysis_config": {
        "bucket_span": "1h",
        "detectors": [{
            "detector_description": "count per method_status",
            "function": "count",
            "partition_field_name": "method_status"
        }],
        "influencers": ["method", "status"]
    },
    "data_description": {
        "time_field": "@timestamp"
    },
    "results_index_name": "mynewresultsindexname"
}

Bharath_Kumar_R · June 19, 2019, 11:49am

PUT /ml/datafeeds/datafeed-my_job/
{
"job_id": "my_job",
"indices": [
"ab-*"
],
"query": {
"match_all": {
}
},
"script_fields": {
"method_status": {
"script": {
"source": "doc['SHIPPERID'].value+ '' + doc['CARRIERID'].value",
"lang": "painless"
},
"ignore_failure": false
}
}

}

In the above request, does Elasticsearch filters the documents for every combination of the fields given?
For eg, if SHIPPERID="abcd" and CARRIERID="efgh", then does it automatically filter the documents with the given field values?
Does it create separate model for every combination of the given fields? I need separate models to be created for every collection of documents filtered with a combination of the given fields?

As of right now, I'm directly adding some combinations of the given fields as filters and creating individual single metric jobs for each of the saved searches

richcollier · June 19, 2019, 12:11pm

The query in the datafeed does not filter - it simply creates a new field that is the concatenation of two other fields in the documents.

method: GET
status:200
method_status: GET_200

It is the ML job configuration, specifically the:

            "partition_field_name": "method_status"

that creates an independent baseline analysis for every instance of method_status, thus, every observed combination of those two fields

Bharath_Kumar_R · June 19, 2019, 12:36pm

PUT /ml/datafeeds/datafeed-my_job/
{
"job_id": "my_job",
"indices": [
"ab-*"
],
"query": {
"match_all": {
}
},
"script_fields": {
"method_status": {
"script": {
"source": "doc['SHIPPERID'].value+ '' + doc['CARRIERID'].value",
"lang": "painless"
},
"ignore_failure": false
}
}

}

I get the foll error:
"error": {
"root_cause": [
{
"type": "status_exception",
"reason": "[datafeed-my_job] cannot retrieve field [SHIPPERID_CARRIERID] because it has no mappings"
}
],

Is the error produced because the index does not have the field SHIPPERID_CARRIERID or any other reason?

richcollier · June 19, 2019, 12:51pm

it is because you called your scripted field method_status, not SHIPPERID_CARRIERID

This is where you did that:

"script_fields": {
"method_status": {
"script": {

You can see what your datafeed returns by:

GET _xpack/ml/datafeeds/datafeed-my_job/_preview/

Bharath_Kumar_R · June 19, 2019, 1:19pm

Can you suggest me a blog which explains creation of jobs like these via requests on console?

richcollier · June 19, 2019, 1:34pm

There is no specific blog on this, but the online API docs show everything

Bharath_Kumar_R · June 19, 2019, 1:41pm

Hey, as you suggested, I created the job and started the Datafeeds for the job. Under the Single Metric Viewer, I couldn't view my job. Is it because open jobs can't be viewed under Single Metric Viewer?

richcollier · June 19, 2019, 1:54pm

No, jobs can be viewed in Single Metric Viewer as long as they have results.

Check the Job Management page for your newly created job and check its status there. I'm guessing that if you tried to start the job from the API, you may have started the datafeed but neglected to "open" the job first.

By the way, even if you set the config of the job/datafeed with the API, you can still use the Job Management UI to start/stop the job.

Bharath_Kumar_R · June 19, 2019, 2:00pm

No, I first opened the job with the request:
POST _ml/anomaly_detectors/my_job/_open

and started Datafeeds with request:
POST _ml/datafeeds/datafeed-my_job/_start

This is how my job appears in Job management page and as you can see Single Metric Viewer has been disabled

richcollier · June 19, 2019, 2:36pm

Ah yes - I forgot. This is because of the datafeed creates a script_field, making the Single Metric Viewer incapable of reconstructing the query to paint the time series.

We'll support that in v7.2: https://github.com/elastic/kibana/pull/34079

Bharath_Kumar_R · June 21, 2019, 1:47pm

Hey, I tried to fetch the anomaly results of a specific value from field SHIPPERID_CARRIERID. But I'm still getting all the results. What needs to be corrected in the following query:

GET .ml-anomalies-.write-my_job_low_sum/_search
{
    "size": 10000,
    "query": {
            "bool": {
              "should": [
                {
                  "match": {
                    "SHIPPERID_CARRIERID": "abcd"
                  }
                }
              ], 
              "filter": [
                  { "term" :  { "result_type" : "record"}},
                  { "range" : { "record_score" : { "gte": "75" } } },
                  { "range" : { "multi_bucket_impact" : { "lt": "-4" } } }
                  ]
            }
    }
}

How do I get only the results from the job which satisfies "SHIPPERID_CARRIERID": "abcd"

richcollier · June 21, 2019, 2:38pm

GET .ml-anomalies-my_job_low_sum/_search
{
    "size": 10000,
    "query": {
            "bool": {
              "filter": [
                  { "term" :  { "result_type" : "record"}},
                  { "term" :  { "partition_field_value" : "abcd"}},
                  { "range" : { "record_score" : { "gte": "75" } } },
                  { "range" : { "multi_bucket_impact" : { "lt": "-4" } } }
                  ]
            }
    }
}

Bharath_Kumar_R · June 26, 2019, 12:05pm

Is it possible to create a scripted job for some particular values of a field alongside the fields SHIPPERID_CARRIERID? Like example, I have a value field3= "xyz" and I wanted to create job with the field combinations of SHIPPERID, CARRIERID and field3="xyz"?

richcollier · June 26, 2019, 12:48pm

Not sure I fully understand. You want to continue to use your scripted field of SHIPPERID_CARRIERID but only analyze this for values of field3="xyz"?

If so, then in your datafeed, you'd need to replace the match_all part with a query that limits to only that field value, i.e something like.

            "bool": {
              "filter": [         
                   { "term" :  { "field3" : "xyz"}}
               ]
           }

Bharath_Kumar_R · June 26, 2019, 2:40pm

Hey, I have a doubt regarding anomalies. Will the anomalies' score reduce or change as more and more data is input to the job. Right now, I'm getting over 2000 anomalies for the scripted job I ran for data with different combinations of 2 fields. Will the number of anomalies detected change over time?

richcollier · June 27, 2019, 10:31am

In general, yes - with more data, the more mature the modeling of that data becomes, and the more accurate the anomaly detection results get.

Plus, keep in mind that not all anomalies are created equal - use the scoring ranges in order to rank the anomalies by severity and obtain the appropriate amount of anomalies you'd like to deal with.

Topic		Replies	Views
How to create outlier jobs with data fields coming from multiple sources (log1,log2, metricbeat1, etc....) Elasticsearch elastic-stack-machine-learning	2	393	November 29, 2021
Question on how to create a simple ML job Elasticsearch elastic-stack-machine-learning	12	1113	October 29, 2018
Help: Create multi metric machine learning job Elasticsearch elastic-stack-machine-learning	5	898	January 4, 2021
Referencing field name from datafeed aggregation to use as a detector in an ML job Kibana elastic-stack-machine-learning	10	942	January 16, 2019
ML Job on Scripted field Elasticsearch elastic-stack-machine-learning	22	3641	March 19, 2018

Create jobs with field combinations

Related topics