X-pack Single metric job

Sanriya_Krimzn · February 1, 2019, 4:36am

Hi,
Anyone there?
Can anyone explain whats the math behind calculation of actual value which is responsible for anomaly score calculation?
I read the theory but I need to understand math/algorith behind it.

Because In single metric job I calculated the sum(field) within 30min bucket span.
But calculated sum value was not equal to the value displayed in the anomaly chart.

richcollier · February 1, 2019, 2:26pm

The actual value in your case is the sum(field) within the time range of :

(bucket_time,bucket_time+30m]

meaning inclusive of the start of the bucket, but 1ms shy of the end time. So, for example, here's an anomaly that shows the actual value of "6386" between 5:45am and 6:00am

However, the reality is, this is the value between 05:45:00.000 and 05:59:59.999 as seen by this Kibana visualization:

For reference, if the end-time of the kibana visualization was 06:00:00.000, the sum would actually be different because there is a document in this index with that timestamp, thus increasing the sum:

Bottom line, the bucket that ML uses is "up to but not including" the end time.

Sanriya_Krimzn · February 4, 2019, 7:09am

Thanks richcollier,
After excluding the upper limit value it works fine.
I have one more doubt. Upper bound and lower bound of model becomes more accurate with more data feed. Am I correct?
If so what is the algorithm/math running behind scene which gives upper bound & lower bund value.?

richcollier · February 4, 2019, 1:58pm

The algorithms are too complex to explain here. Perhaps you can glean some insight by watching these videos from past ElasticONs:

2017 ElasticON:

https://www.elastic.co/elasticon/conf/2017/sf/machine-learning-and-statistical-methods-for-time-series-analysis

2018 ElasticON:

https://www.elastic.co/elasticon/conf/2018/sf/the-math-behind-elastic-machine-learning

Sanriya_Krimzn · February 5, 2019, 9:45am

Hi yea I will go through.

I have one trouble in anomaly records retrieval.
I am using following command to retrieve anomaly records.
curl -X GET "localhost:9200/xpack/ml/anomaly_detectors/sum_single_metric_anomaly/results/records" -H 'Content-Type: application/json' -d'
{
_ "sort": "record_score",
_ "desc": true,_
_ "start": "2018-03-20T15:00:00.000Z",_
_ "end": "2018-03-20T17:00:00.000Z"_
}'

But record shown in the below image is not getting captured. Though it is falling within the chosen interval.

richcollier · February 5, 2019, 12:01pm

Hello Sanriya,

I'm not sure what timezone you are in, but keep in mind that you are asking ML's API for anomalies between 15:00 and 17:00 UTC. However, your Kibana screenshot is showing information in your local timezone. Are you sure you're asking for the right window of time in your API query?

Sanriya_Krimzn · February 20, 2019, 11:57am

Yeaa. Ok Thank You.
Can we customize the threshold for severity levels.
Like in anomaly chart if anomaly score is above 75 it is critical and shaded with res.
Is there any way to customize the threshold level ???

richcollier · February 20, 2019, 1:54pm

I'm not sure what you mean by "customize the threshold level" - the scores (between 0 an d 100) are dynamically calculated (see this blog for more information: https://www.elastic.co/blog/machine-learning-anomaly-scoring-elasticsearch-how-it-works)

We've chosen the color scheme based on our own defined intervals:
0-25: blue
25-50: yellow
50-75: orange
75-100: red

One cannot change these color mappings.

Sanriya_Krimzn · February 21, 2019, 4:33am

Thanks for quick reply Richcollier
My doubt was whether we can change the intervals.
For example If I want to set only two intervals,
0-50: blue
50-100:red
Is it possible??

From your reply I understood it is not possible for anyone to change the mappings (Interval or Colour) in any of the versions. Is it correct???

Am just curious to Know whether we can tweak things like interval limit or colour of those anomaly dots..

I do have 2 more questions!!!

In any ML Job I create using x-pack, I am always forced to select bucket span and aggregation method.
Is there any way to feed absolute value of each record without specifying bucket span and aggregation??
And second question is, when creating job Is there any way to use my own aggration logic??

richcollier · February 21, 2019, 3:35pm

Again, the scores are not customizable and the colors and ranges are not customizable if you want to use our own built-in UI. But, you can certainly make any custom UI (including one using TSVB or Canvas) to view the ML results in any way you would like.

As for your questions:

In any ML Job I create using x-pack, I am always forced to select bucket span and aggregation method .
Is there any way to feed absolute value of each record without specifying bucket span and aggregation??

No - our ML works by pre-aggregating documents using bucketization.

And second question is, when creating job Is there any way to use my own aggration logic??

Yes, see this blog

Sanriya_Krimzn · February 22, 2019, 10:13am

Thank You Richcollier for guiding all the way
I read the blog to create own custom aggregation.
I am completely new to X-Pack, dev console. So I might be throwing basic doubts.

I did the following steps

Step (1) : Created a Job
Step (2) : Created a datafeed to the Job
Step (3) : Opened the Job
Step (4) : Start the datafeed

Till Step(4) It went fine

But when I tried to take log(Read_respone_d) column and feed as input to model. I was not able to do it.
And also I wanted to divide (Read_response_d)/10 and feed as input to my Job.
Is It possible????
If so let know the inbuilt/ API/ Syntax/Function which might help.

Below Is my API
#To Create a new Job
PUT _xpack/ml/anomaly_detectors/firsttrail
{
"description":"First simple job",
"analysis_config":{
"bucket_span": "5m",
"latency": "0ms",
"detectors":[
{
"detector_description": "sum(ReadResponse_d)",
"function":"sum",
"field_name": "ReadResponse_d"
}
]
},
"data_description": {
"time_field":"StartTime-Stamp"
}
}

#To create data feed for Job created
PUT _xpack/ml/datafeeds/datafeedfirsttrail_job
{
"job_id": "firsttrail",
"indexes": ["check_nos_data"],
"query": {
"match_all": {
"boost": 1
}
}
}

#Open a Job
POST _xpack/ml/anomaly_detectors/firsttrail/_open
{
"timeout": "35m"
}

#Start Data Feed to the Job
POST _xpack/ml/datafeeds/datafeedfirsttrail_job/_start
{
"start": "2017-04-07T18:22:16Z"
}

richcollier · February 22, 2019, 4:08pm

If you want to affect the value of a single field in your query, you can leverage the built-in capability of Elasticsearch to create a "scripted field": https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-script-fields.html

This is an on-the-fly way to create a new value from existing values in the data. If you construct your query using scripted fields, then you can have the ML job use this query in the datafeed.

See other forum questions on using scripted fields in ML jobs: https://discuss.elastic.co/search?expanded=true&q=scripted%20field%20tags%3Amachine-learning

Sanriya_Krimzn · February 26, 2019, 10:15am

Hi Richcollier,
Sorry for delayed response. I was dragged into some other activity.

I followed below steps to create a scripted field and came up with two issues.
Step 1) Created a scripted field in Datafeed
Step 2) Created an anomaly Job with scripted field as detector.
I went to Machine Learning tab to view the anomaly. I had two issues there.

Issue 1) Data feed has started properly and anomalies have been generated. But then It was throwing a warning which I have highlighted in pink.
I didn't get any error in datafeed preview in dev console. I cross-checked the values for sample record. It was correct.
And also I was not able to view anomaly chart in single metric viewer. When I surfed over, I noticed It was an issue. That Issue still persists?

Issue 2) I am still worried that am forced to specify a function for aggregation on top of scripted field.. Can I expect Input feed to model without aggregation in any upcoming release???

Below is my API
#create Job wripted field
PUT _xpack/ml/anomaly_detectors/firsttrail
{
"description":"First simple job",
"analysis_config":{
"bucket_span": "5m",
"latency": "0ms",
"detectors":[
{
"detector_description": "sum(datafeedfirsttrail_job)",
"function":"sum",
"field_name": "Response_Scripted"
}
]
},
"data_description": {
"time_field":"StartTime-Stamp"
}
}

#Start Datafeed with scr field
PUT _xpack/ml/datafeeds/datafeedfirsttrail_job
{
"job_id": "firsttrail",
"indexes": ["check_nos_data_etc"],
"query": {
"match_all": {
"boost": 1
}
},
"script_fields": {
"Response_Scripted": {
"script": {
"lang": "expression",
"source": "doc['ReadResponse_d'].value / 2"
}
}
}
}

#Open a Job
POST _xpack/ml/anomaly_detectors/firsttrail/_open
{
"timeout": "35m"
}

#Start Data Feed to the Job
POST _xpack/ml/datafeeds/datafeedfirsttrail_job/_start
{
"start": "2017-04-07T18:22:16Z"
}

#Get preview created scripted fieldssss
GET _xpack/ml/datafeeds/datafeedfirsttrail_job/_preview

richcollier · February 26, 2019, 3:11pm

Since you verified that your setup was correct (i.e. the datafeed preview was returning the correct results) and that the job did indeed process some records, I suspect that the problems behind the "Datafeed is encountering errors" might be subtle - you should look in the elasticsearch.log file around the time that the job was encountering these problems and see what detailed information the logs contain.

The single metric viewer will not properly show you the chart of the metric over time because it doesn't know how to "reverse engineer" your scripted field query in order to draw the chart. You can work around this by forcing the enabling of the "model plot" for the single metric job. Add the following JSON to your job config:

  "model_plot_config": {
    "enabled": true
  },

so that:

PUT _xpack/ml/anomaly_detectors/firsttrail
{
  "description": "First simple job",
  "analysis_config": {
    "bucket_span": "5m",
    "latency": "0ms",
    "detectors": [
      {
        "detector_description": "sum(datafeedfirsttrail_job)",
        "function": "sum",
        "field_name": "Response_Scripted"
      }
    ]
  },
  "model_plot_config": {
    "enabled": true
  },
  "data_description": {
    "time_field": "StartTime-Stamp"
  }
}

Issue 2) I am still worried that am forced to specify a function for aggregation on top of scripted field.. Can I expect Input feed to model without aggregation in any upcoming release???

Yes, you always need to supply a function. Even if there is only 1 value per bucket_span in the datafeed - the sum(somefieldname) is the same as the value of somefieldname

Sanriya_Krimzn · February 27, 2019, 7:24am

Hi I added

"model_plot_config": {
    "enabled": true
  },

Even after adding It is not generating single metric anomaly chart.

Regarding - Datafeed is encountering errors extracting data: no such index
I Checked the elasticsearch log file. I didn't find error logged when job was created. When creating job over UI it is not showing any warning.
When I use API in dev console it shows the warning.
Immediately after defining the datafeed for the job it shows up the warning. Any thoughts on this??

Peter_Harverson · February 27, 2019, 2:47pm

Reproducing a job with a detector using a scripted field, and with model plot enabled

"model_plot_config": {
    "enabled": true
  }

has revealed that the Single Metric Viewer is not enabled as it should be when model plot is enabled, even if the detector uses a scripted a field. I have created a GitHub issue for this - https://github.com/elastic/kibana/issues/32124, and will look to get this fixed in an upcoming release.

Thanks
Pete

Sanriya_Krimzn · March 20, 2019, 7:28am

Hi ,

I am doing Data Transormation logstash.
And am facing a problem when using multiline.

Sample Log:
11 This 96.112.248.81
'create' =>
array (
'key1' => 'value1',
'key2' => 'value2',
'key6' => 'value3'
),
)
[2014-03-02 17:34:20] - 127.0.0.1|DEBUG| flush_multi_line
Code Snippet 1
filter { multiline{ pattern => "[0-9]*\s+%{DATA}\s+%{IP}" what => "previous" negate=> true } if "|DEBUG| flush_multi_line" in [message]{ drop{} # We don't need the dummy line so drop it } kv { field_split => "\n" #value_split => ":" source => "message" } }

Code Snippet 2
filter { multiline{ pattern => "^11 This %{IP}" what => "previous" negate=> true } if "|DEBUG| flush_multi_line" in [message]{ drop{} # We don't need the dummy line so drop it } kv { field_split => "\n" source => "message" } }

Code Snippet 1 Works and Code Snippet 2 doesn't work.
Only difference is pattern in multiline.
In Snippet 1 pattern is "[0-9]*\s+%{DATA}\s+%{IP}"
In Snippet 2 pattern is "^11 This %{IP}"

Can you figure out mistake am doing in Snippet 2???

Peter_Harverson · March 20, 2019, 9:47am

Hi,

Would you please be able to open a new topic for this logstash data transformation question? That will ensure it is directed to a logstash expert who will be able to help you with a solution.

Many thanks
Pete

Sanriya_Krimzn · March 20, 2019, 9:52am

Yeaa okiee

system · April 17, 2019, 9:52am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Multi Metric and Single Metric Results Differ for the Same Time Series Elasticsearch elastic-stack-machine-learning	7	898	October 30, 2018
Variation in data read by machine learning module to the actual data present in CSV file Elasticsearch	20	1244	December 5, 2017
ML Job on Scripted field Elasticsearch elastic-stack-machine-learning	22	3716	March 19, 2018
Anomaly detection using derivative aggregation Elasticsearch elastic-stack-machine-learning	4	883	October 20, 2020
[Jobs] Add threshold information to calculate anomlay score Elasticsearch elastic-stack-machine-learning	3	567	October 30, 2018

X-pack Single metric job

I do have 2 more questions!!!

I did the following steps

Related topics