Hi, I am trying to do ratio in my data feed. First I created date_histogram with max agg (couse I got an error that it's needed) and then I am doing subaggregation by attrs.src_ca_name and want to count ratio of success calls. Everything looks fine, no error just the data feed preview is empty. Can ML parse two subaggregation?
..."aggregations": {
"buckets": {
"date_histogram": {
"field": "@timestamp",
"interval": "15m",
"time_zone": "UTC"
},
"aggregations": {
"@timestamp": {
"max": {
"field": "@timestamp"
}
},
"by_src": {
"terms": {
"field": "attrs.src_ca_name",
"size": 20,
"order": {
"_count": "desc"
}
},
"aggregations": {
"justattempts": { "filter": { "term": { "type": "call-attempt" } } },
"ratio" : {
"bucket_script" : {
"buckets_path": {
"atmptcnt": "justattempts>_count",
"totalcnt": "_count"
},
"script" : "params.atmptcnt * 100 / params.totalcnt"
}....
Akaren
July 9, 2018, 12:32pm
3
But when I do search
it returns correct data
{
"took": 448,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 1331042,
"max_score": 1,
"hits": [
{
"_index": "logstash-2018.06.19",
"_type": "sbc_event",
"_id": "AWQXBrJ3oA32JM6LbXea",
"_score": 1,
"_source": {
"attrs": {
"dst_ca_name": "AAAA"
}
}
},
..........
"aggregations": {
"buckets": {
"buckets": [
{
"key_as_string": "2018-06-19T00:00:00.000Z",
"key": 1529366400000,
"doc_count": 274,
"@timestamp": {
"value": 1529366699000,
"value_as_string": "2018-06-19T00:04:59.000Z"
},
"by_src": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "BBBBB",
"doc_count": 224,
"justattempts": {
"doc_count": 217
},
"ratio": {
"value": 96.875
}
},
But when I run this in ML job it returns nothing.
Akaren
July 9, 2018, 12:35pm
4
Here is my whole ML datafeed ratio
PUT _xpack/ml/datafeeds/datafeed-ratio/
{
"job_id": "ratio_ca",
"indices": [
"logstash-2018.06.19"
],
"types": [
"doc"
],
"query": {
"bool": {
"must": [
{
"terms": {"type":["call-attempt","call-end"]}
}
],
"must_not": []
}
},
"aggregations": {
"buckets": {
"date_histogram": {
"field": "@timestamp",
"interval": "15m",
"time_zone": "UTC"
},
"aggregations": {
"@timestamp": {
"max": {
"field": "@timestamp"
}
},
"by_src": {
"terms": {
"field": "attrs.src_ca_name",
"size": 20,
"order": {
"_count": "desc"
}
},
"aggregations": {
"justattempts": { "filter": { "term": { "type": "call-attempt" } } },
"ratio" : {
"bucket_script" : {
"buckets_path": {
"atmptcnt": "justattempts>_count",
"totalcnt": "_count"
},
"script" : "params.atmptcnt * 100 / params.totalcnt"
}
}
}
}
}}
}
}
Can you paste the output from the following command in DevTools Console?
GET _xpack/ml/datafeeds/datafeed-ratio/_preview
And please tell us what version of Elastic Stack you are using...
Ok - let me look more closely at your code and I'll see if I can replicate
Please show me the config of the ML job itself:
GET _xpack/ml/anomaly_detectors/ratio_ca?pretty
Akaren
July 9, 2018, 1:27pm
11
{
"count": 1,
"jobs": [
{
"job_id": "ratio_ca",
"job_type": "anomaly_detector",
"job_version": "6.1.1",
"description": "Ratio ca",
"create_time": 1530537786467,
"analysis_config": {
"bucket_span": "15m",
"summary_count_field_name": "doc_count",
"detectors": [
{
"detector_description": "sum(ratio_ca)",
"function": "sum",
"field_name": "ratio_ca2",
"detector_rules": [],
"detector_index": 0
}
],
"influencers": []
},
"analysis_limits": {
"model_memory_limit": "1024mb"
},
"data_description": {
"time_field": "@timestamp",
"time_format": "epoch_ms"
},
"model_plot_config": {
"enabled": true
},
"model_snapshot_retention_days": 1,
"results_index_name": "shared"
}
]
}
Hmm...your detector
references the field_name
of ratio_ca2
but in your datafeed definition, the calculated field is just called ratio
They need to be the same
Akaren
July 9, 2018, 1:55pm
13
Thanks but it doesn't help
{
"count": 1,
"jobs": [
{
"job_id": "ratio_ca",
"job_type": "anomaly_detector",
"job_version": "6.1.1",
"description": "Ratio ca",
"create_time": 1531144344777,
"analysis_config": {
"bucket_span": "15m",
"summary_count_field_name": "doc_count",
"detectors": [
{
"detector_description": "sum(ratio_ca)",
"function": "sum",
"field_name": "ratio",
"detector_rules": [],
"detector_index": 0
}
],
"influencers": []
},
"analysis_limits": {
"model_memory_limit": "1024mb"
},
"data_description": {
"time_field": "@timestamp",
"time_format": "epoch_ms"
},
"model_plot_config": {
"enabled": true
},
"model_snapshot_retention_days": 1,
"results_index_name": "shared"
}
]
}
still GET _xpack/ml/datafeeds/datafeed-ratio/_preview is empty
One other thing to notice is that you do a terms
aggregation in the query, which implies you want separate analyses per by_src
, but your detector
makes no reference to this split. You might want to make your job config something like:
"analysis_config": {
"bucket_span": "15m",
"summary_count_field_name": "doc_count",
"detectors": [
{
"detector_description": "sum(ratio)",
"function": "sum",
"field_name": "ratio",
"partition_field_name" : "by_src",
"detector_rules": [],
"detector_index": 0
}
],
"influencers": [ "by_src"]
},
Akaren
July 9, 2018, 7:00pm
15
Thank you I have set it but it's still not woking.
Sorry this is giving you trouble, but I cannot immediately see what your issue is and I cannot reproduce the problem given a similar situation - my setup works fine.
This hints at some subtle syntax error that's hard to spot.
May I suggest that you use the method of debugging by starting simple and progressing up to your desired end-state. So, for example, define the ML job, then define the ML datafeed without the bucket_script
aggregation, just the date_histogram
, the max
on @timestamp
and the terms
aggregations.
Then run the datafeed _preview
to see what you get (you should just get a bucketized count for each by_src
similar to:
[
{
"@timestamp": 1486426496000,
"by_src": "AAA",
"doc_count": 15
},
{
"@timestamp": 1486426496000,
"by_src": "BBB",
"doc_count": 11
},
If you can get that, then move back to adding the bucket_script
aggregation
Akaren
July 11, 2018, 7:32am
17
I have found an problem. It was the field
"types": [
"doc"
]
This should be the name of aggregation? If I run it without it, it works.
It is an index property that is a hold-over from pre-v6.x of elasticsearch and will be removed fully in v7.x:
https://www.elastic.co/guide/en/elasticsearch/reference/6.2/ml-put-datafeed.html
types
(array) A list of types to search for within the specified indices. For example: []. This property is provided for backwards compatibility with releases earlier than 6.0.0. For more information, see Removal of mapping types .
Akaren
July 11, 2018, 6:22pm
19
Thank you very much for your help!
1 Like