Symptom -
Any ML job I create runs for the first time and then stops and never runs.
Here is an example Job -
{ "job_id": "tibco-celect", "job_type": "anomaly_detector", "job_version": "6.0.0", "description": "Celect Trends", "create_time": 1528730826156, "finished_time": 1528730828213, "analysis_config": { "bucket_span": "15m", "summary_count_field_name": "doc_count", "detectors": [ { "detector_description": "non_zero_count", "function": "non_zero_count", "detector_rules": [], "detector_index": 0 } ], "influencers": [] }, "data_description": { "time_field": "timestamp", "time_format": "epoch_ms" }, "model_plot_config": { "enabled": true }, "model_snapshot_retention_days": 1, "model_snapshot_id": "1528823700", "results_index_name": "custom-tibco-celect", "data_counts": { "job_id": "tibco-celect", "processed_record_count": 128, "processed_field_count": 128, "input_bytes": 5577, "input_field_count": 128, "invalid_date_count": 0, "missing_field_count": 0, "out_of_order_timestamp_count": 0, "empty_bucket_count": 2, "sparse_bucket_count": 0, "bucket_count": 128, "earliest_record_timestamp": 1528689731828, "latest_record_timestamp": 1528810168631, "last_data_time": 1528810335774, "latest_empty_bucket_timestamp": 1528705800000, "input_record_count": 128 }, "model_size_stats": { "job_id": "tibco-celect", "result_type": "model_size_stats", "model_bytes": 66506, "total_by_field_count": 3, "total_over_field_count": 0, "total_partition_field_count": 2, "bucket_allocation_failures_count": 0, "memory_status": "ok", "log_time": 1528823850000, "timestamp": 1528823700000 }, "datafeed_config": { "datafeed_id": "datafeed-tibco-celect", "job_id": "tibco-celect", "query_delay": "60ms", "frequency": "150s", "indices": [ "tibauditxml-" ], "types": [ "doc" ], "query": { "bool": { "must": [ { "query_string": { "query": "", "fields": [], "type": "best_fields", "default_operator": "or", "max_determinized_states": 10000, "enable_position_increments": true, "fuzziness": "AUTO", "fuzzy_prefix_length": 0, "fuzzy_max_expansions": 50, "phrase_slop": 0, "analyze_wildcard": true, "escape": false, "boost": 1 } }, { "match_phrase": { "EventDescription": { "query": "From Celect: Message sent to sterling successfully", "slop": 0, "boost": 1 } } } ], "adjust_pure_negative": true, "boost": 1 } }, "aggregations": { "buckets": { "date_histogram": { "field": "timestamp", "interval": 900000, "offset": 0, "order": { "_key": "asc" }, "keyed": false, "min_doc_count": 0 }, "aggregations": { "timestamp": { "max": { "field": "timestamp" } } } } }, "scroll_size": 1000, "chunking_config": { "mode": "manual", "time_span": "900000000ms" }, "state": "stopped" }, "state": "opened", "node": { "id": "w-t1K8AzSmm5UBwmQ7FpWg", "name": "w-t1K8A", "ephemeral_id": "4VahUs23RTGO0js5rRT_Gw", "transport_address": "10.75.19.172:9300", "attributes": { "ml.max_open_jobs": "10", "ml.enabled": "true" } }, "open_time": "260859s" }
Job logs -
2018-06-11 11:27:06 w-t1K8A Job created
2018-06-11 11:27:06 w-t1K8A Loading model snapshot [N/A], job latest_record_timestamp [N/A]
2018-06-11 11:27:06 w-t1K8A Opening job on node [{w-t1K8A}{w-t1K8AzSmm5UBwmQ7FpWg}{4VahUs23RTGO0js5rRT_Gw}{10.75.19.172}{10.75.19.172:9300}{ml.max_open_jobs=10, ml.enabled=true}]
2018-06-11 11:27:07 w-t1K8A Starting datafeed [datafeed-tibco-celect] on node [{w-t1K8A}{w-t1K8AzSmm5UBwmQ7FpWg}{4VahUs23RTGO0js5rRT_Gw}{10.75.19.172}{10.75.19.172:9300}{ml.max_open_jobs=10, ml.enabled=true}]
2018-06-11 11:27:07 w-t1K8A Datafeed started (from: 2018-06-11T03:26:40.100Z to: 2018-06-11T15:26:40.102Z)
2018-06-11 11:27:08 w-t1K8A Datafeed lookback completed
2018-06-11 11:27:08 w-t1K8A Datafeed stopped
2018-06-11 11:27:08 w-t1K8A Job is closing
2018-06-11 11:27:22 w-t1K8A Loading model snapshot [1528730828] with latest_record_timestamp [2018-06-11T15:14:49.000Z], job latest_record_timestamp [2018-06-11T15:14:49.575Z]
2018-06-11 11:27:22 w-t1K8A Opening job on node [{w-t1K8A}{w-t1K8AzSmm5UBwmQ7FpWg}{4VahUs23RTGO0js5rRT_Gw}{10.75.19.172}{10.75.19.172:9300}{ml.max_open_jobs=10, ml.enabled=true}]
2018-06-11 11:27:23 w-t1K8A Starting datafeed [datafeed-tibco-celect] on node [{w-t1K8A}{w-t1K8AzSmm5UBwmQ7FpWg}{4VahUs23RTGO0js5rRT_Gw}{10.75.19.172}{10.75.19.172:9300}{ml.max_open_jobs=10, ml.enabled=true}]
2018-06-11 11:27:23 w-t1K8A Datafeed started (from: 2018-06-11T15:14:49.576Z to: real-time)
2018-06-11 11:36:55 w-t1K8A Datafeed stopped
2018-06-12 09:32:11 w-t1K8A Starting datafeed [datafeed-tibco-celect] on node [{w-t1K8A}{w-t1K8AzSmm5UBwmQ7FpWg}{4VahUs23RTGO0js5rRT_Gw}{10.75.19.172}{10.75.19.172:9300}{ml.max_open_jobs=10, ml.enabled=true}]
2018-06-12 09:32:11 w-t1K8A Datafeed started (from: 2018-06-11T16:15:00.000Z to: real-time)
2018-06-12 09:32:16 w-t1K8A Datafeed continued in real-time
2018-06-12 09:32:16 w-t1K8A Datafeed lookback completed
2018-06-12 09:33:41 w-t1K8A Datafeed stopped
2018-06-12 09:34:56 w-t1K8A Starting datafeed [datafeed-tibco-celect] on node [{w-t1K8A}{w-t1K8AzSmm5UBwmQ7FpWg}{4VahUs23RTGO0js5rRT_Gw}{10.75.19.172}{10.75.19.172:9300}{ml.max_open_jobs=10, ml.enabled=true}]
2018-06-12 09:34:56 w-t1K8A Datafeed started in real-time
2018-06-12 10:55:00 w-t1K8A Datafeed has been retrieving no data for a while
2018-06-12 14:34:14 w-t1K8A Datafeed stopped