ML Multi-Metric query fails when similar Single-Metric is OK

Vitaly_il · August 24, 2017, 1:15pm

I'm starting to play with ML jobs, but encountered an issue with Multi-Metric job.
It fails with

Datafeed is encountering errors extracting data: [ml-multi-low-count-test]
Search request returned shard failures; first failure: shard [[YuBCwg][logstash-general-2017.08.09][0]], reason [RemoteTransportException[[elasticsearch][127.0.0.1:9300]
[indices:data/read/search[phase/query]]]; nested: QueryShardException[No mapping found for [@timestamp] in order to sort on]; ]; see logs for more info

Single-Metric job with the same indexes works nice. But as far as I see, time_field is the same in both cases:

"data_description": {
"time_field": "@timestamp",
"time_format": "epoch_ms"

Any ideas how to debug this?

richcollier · August 25, 2017, 7:37pm

Hi Vitaly,

Please do the following for both jobs (the working single-metric and the non-working multi-metric)

curl -u elastic:changeme -XGET 'localhost:9200/_xpack/ml/anomaly_detectors/<job_id>?pretty'

curl -u elastic:changeme -XGET 'localhost:9200/_xpack/ml/datafeeds/datafeed-<job_id>?pretty'

Then we can compare the two configurations

Vitaly_il · August 27, 2017, 5:13pm

Thank you, please see below:

{
"count" : 1,
"jobs" : [
{
"job_id" : "ml-test-single-metric-low-count",
"job_type" : "anomaly_detector",
"job_version" : "5.5.2",
"description" : "2nd 24.08.2017",
"create_time" : 1503565565772,
"finished_time" : 1503565567428,
"analysis_config" : {
"bucket_span" : "5m",
"summary_count_field_name" : "doc_count",
"detectors" : [
{
"detector_description" : "low_count",
"function" : "low_count",
"detector_rules" : [ ],
"detector_index" : 0
}
],
"influencers" : [ ]
},
"data_description" : {
"time_field" : "@timestamp",
"time_format" : "epoch_ms"
},
"model_plot_config" : {
"enabled" : true
},
"model_snapshot_retention_days" : 1,
"model_snapshot_id" : "1503828750",
"results_index_name" : "shared"
}
]
}
2)
{
"count" : 1,
"jobs" : [
{
"job_id" : "ml-multi-low-count-test",
"job_type" : "anomaly_detector",
"job_version" : "5.5.2",
"description" : "3rd",
"create_time" : 1503568129335,
"finished_time" : 1503568130891,
"analysis_config" : {
"bucket_span" : "5m",
"detectors" : [
{
"detector_description" : "low_count",
"function" : "low_count",
"partition_field_name" : "type.keyword",
"detector_rules" : [ ],
"detector_index" : 0
}
],
"influencers" : [
"type.keyword"
]
},
"data_description" : {
"time_field" : "@timestamp",
"time_format" : "epoch_ms"
},
"model_snapshot_retention_days" : 1,
"results_index_name" : "shared"
}
]
}

richcollier · August 28, 2017, 1:44pm

Vitaly - Please also collect the output from the data feed config for each job. See above.

Vitaly_il · August 28, 2017, 4:29pm

Thank you, here it is:

{
"count" : 1,
"datafeeds" : [
{
"datafeed_id" : "datafeed-ml-multi-low-count-test",
"job_id" : "ml-multi-low-count-test",
"query_delay" : "60s",
"frequency" : "150s",
"indices" : [
"logstash-*"
],
"types" : [
"newrelic-nrsysmond",
"bvpyzabbix",
"rabbitmq",
........
],
"query" : {
"match_all" : {
"boost" : 1.0
}
},
"scroll_size" : 1000,
"chunking_config" : {
"mode" : "auto"
}
}
]
}

richcollier · August 28, 2017, 5:29pm

^^ missing the one for the single metric job

My guess is that it's possible that in the multi-metric job, there are multiple data "types" in your "logstash-*" which have a different timestamp field from each other?

Just so you know - "types" are being depcreated by Elasticsearch in v6.0. Just so you're prepared...

Vitaly_il · August 28, 2017, 7:10pm

Just so you know - “types” are being depcreated by Elasticsearch in v6.0.
Just so you’re prepared…
Thank you
BTW, there are about 20 types, I just cutted out.

it was in my previous letter, after separator.
here it is:

{
"count" : 1,
"datafeeds" : [
{
"datafeed_id" : "datafeed-ml-test-single-metric-low-count",
"job_id" : "ml-test-single-metric-low-count",
"query_delay" : "60s",
"frequency" : "150s",
"indices" : [
"logstash-*"
],
"types" : [
"newrelic-nrsysmond",
.........
],
"query" : {
"match_all" : {
"boost" : 1.0
}
},
"aggregations" : {
"buckets" : {
"date_histogram" : {
"field" : "@timestamp",
"interval" : 300000,
"offset" : 0,
"order" : {
"_key" : "asc"
},
"keyed" : false,
"min_doc_count" : 0
},
"aggregations" : {
"@timestamp" : {
"max" : {
"field" : "@timestamp"
}
}
}
}
},
"scroll_size" : 1000,
"chunking_config" : {
"mode" : "manual",
"time_span" : "300000000ms"
}
}
]
}

richcollier · August 28, 2017, 7:15pm

Right so the types above ^^ are extraneous compared to the config of the single-metric job (which works for you). Clone your existing multi-metric job (to keep most of the config parameters) but remove the above extraneous types, then try to run the job....

Vitaly_il · August 29, 2017, 4:41am

Sorry for confusing output - "types" in both cases are the same. I just
reduced output in different way (in original output there are 50 types)

richcollier · August 29, 2017, 11:50am

Oh ok! That's good to know.

I think that you must have one (or more) of those 50 types with a missing mapping for @timestamp (??)

In the single-metric job, the query to the index automatically includes a date_histogram aggregation on the field @timestamp (as you can see above). The multi-metric job does not do this. So, perhaps, the single-metric job's aggregation is masking the problem in your data?

I can also suggest that you inspect the elasticsearch.log file when hitting the datafeed "preview" for the problematic job:

GET _xpack/ml/datafeeds/datafeed-ml-multi-low-count-test/_preview/

Vitaly_il · August 30, 2017, 1:16pm

Thank you, it really was one of the indices without proper timestamp mapping. Job is running as far as we don't include this index.
Vitaly

Vitaly_il · September 7, 2017, 8:33am

Rich,

I'll appreciate your help again. After further investigation it seems that ML job fails on indices without proper @timestamp filed mapping, but on empty indices.

Job fails with this message:

Datafeed is encountering errors extracting data: [ml-multicount-all-indices-count-test] Search request returned shard failures; first failure: shard [[pREMVKEzTWe-J3vLYuBCwg][logstash-general-2017.09.02][0]], reason [RemoteTransportException[[staging-elk-elasticsearch-][127.0.0.1:9300][indices:data/read/search[phase/query]]]; nested: QueryShardException[No mapping found for [@timestamp] in order to sort on]; ]; see logs for more info

This index is empty, and mapping is:

"logstash-general-2017.09.02": {
"mappings": {
"default": {
"properties": {
"@timestamp": {
"type": "date"
},
"geoip": {
"properties": {
"location": {
"type": "geo_point"
}
}
}
}
}
}
}

richcollier · September 8, 2017, 6:44pm

Vitaly,

Under what conditions do you have empty indices? That shouldn't be the case if you're using daily indices.

Also, I'll let you know again that best practices, you should separate different kinds of data to different indices. In v6.0, Elastic will be deprecating the "_type" mechanism, which allows different types of data to exist in the same physical index.

Vitaly_il · September 10, 2017, 6:01pm

As far as I see, logstash output filter send wrong formatted data - like two comma-separated strings in "type" field.

Yes, I remember. But AFAIK Elastic performance may be affected from a large number of shards?

system · October 8, 2017, 6:01pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Anomaly detection - Elastic Jobs failing to start SIEM elastic-stack-machine-learning	3	808	March 20, 2020
Troubleshooting with machine learning Elasticsearch elastic-stack-machine-learning	9	2077	August 30, 2017
Datafeed fails if time_field is not mapped in all indices Elasticsearch elastic-stack-machine-learning	4	622	January 6, 2020
Datafeed not happening in ml job Kibana elastic-stack-machine-learning	6	714	January 17, 2019
ML multimetric job Elasticsearch elastic-stack-monitoring , elastic-stack-machine-learning	1	12	December 9, 2024

ML Multi-Metric query fails when similar Single-Metric is OK

Related topics