Datafeed stops immediately when performing custom machine learning job


#1

I'm working with the machine learning plugin for Elastic search and I'm trying to figure out why my datafeed immediately fails when starting the job shown below. I used logstash to pull in data from sql server. The logs I am working with are pretty simple, no nested objects or anything like that. Also to note I am running directly in kibana, and not using the API directly if that makes any difference. Any ideas on what might be happening would be appreciated.

(Log Info)
[2017-05-30T19:22:19,745][INFO ][o.e.x.m.j.p.a.AutodetectProcessManager] [beast] Opening job [directip_remoteip]
[2017-05-30T19:22:19,746][INFO ][o.e.x.m.j.p.a.AutodetectProcessManager] [beast] [directip_remoteip] Loading model snapshot [N/A], job latest_record_timestamp [N/A]
[2017-05-30T19:22:19,747][INFO ][o.e.x.m.j.p.a.NativeAutodetectProcessFactory] Restoring quantiles for job 'directip_remoteip'
[2017-05-30T19:22:20,072][INFO ][o.e.x.m.j.p.a.AutodetectProcessManager] [beast] Successfully set job state to [opened] for job [directip_remoteip]
[2017-05-30T19:22:20,341][INFO ][o.e.x.m.d.DatafeedManager] Starting datafeed [datafeed-directip_remoteip] for job [directip_remoteip] in [1970-01-01T00:00:00.000Z, 2017-05-30T19:22:17.001Z)
[2017-05-30T19:22:20,353][INFO ][o.e.x.m.d.DatafeedManager] [no_realtime] attempt to stop datafeed [datafeed-directip_remoteip] for job [directip_remoteip]
[2017-05-30T19:22:20,353][INFO ][o.e.x.m.d.DatafeedManager] [no_realtime] try lock [20s] to stop datafeed [datafeed-directip_remoteip] for job [directip_remoteip]...
[2017-05-30T19:22:20,353][INFO ][o.e.x.m.d.DatafeedManager] [no_realtime] stopping datafeed [datafeed-directip_remoteip] for job [directip_remoteip], acquired [true]...
[2017-05-30T19:22:20,354][INFO ][o.e.x.m.d.DatafeedManager] [no_realtime] datafeed [datafeed-directip_remoteip] for job [directip_remoteip] has been stopped
[2017-05-30T19:22:20,551][INFO ][o.e.x.m.j.p.a.AutodetectProcessManager] [beast] Closing job [directip_remoteip], because [close job (api)]
[2017-05-30T19:22:20,551][INFO ][o.e.x.m.j.p.l.CppLogMessageHandler] [directip_remoteip] [autodetect/23997] [CCmdSkeleton.cc@75] Handled 0 records
[2017-05-30T19:22:20,552][INFO ][o.e.x.m.j.p.l.CppLogMessageHandler] [directip_remoteip] [autodetect/23997] [CAnomalyDetector.cc@1704] Pruning all models
[2017-05-30T19:22:20,552][INFO ][o.e.x.m.j.p.a.o.StateProcessor] [beast] [directip_remoteip] State output finished
[2017-05-30T19:22:20,558][INFO ][o.e.x.m.j.p.a.o.AutoDetectResultProcessor] [directip_remoteip] 0 buckets parsed from autodetect output
[2017-05-30T19:22:20,577][INFO ][o.e.x.m.j.p.a.AutodetectCommunicator] [directip_remoteip] job closed

(Json output for job)
{ "job_id": "directip_remoteip", "job_type": "anomaly_detector", "description": "", "create_time": 1496162012641, "finished_time": 1496167391287, "analysis_config": { "bucket_span": "5m", "detectors": [ { "detector_description": "high_distinct_count(remoteip) over directcallerid", "function": "high_distinct_count", "field_name": "remoteip", "over_field_name": "directcallerid", "detector_rules": [] } ], "influencers": [ "directcallerid", "remoteip" ] }, "data_description": { "time_field": "logdate", "time_format": "epoch_ms" }, "model_snapshot_retention_days": 1, "results_index_name": "custom-directip_remoteip", "data_counts": { "job_id": "directip_remoteip", "processed_record_count": 0, "processed_field_count": 0, "input_bytes": 0, "input_field_count": 0, "invalid_date_count": 0, "missing_field_count": 0, "out_of_order_timestamp_count": 0, "empty_bucket_count": 0, "sparse_bucket_count": 0, "bucket_count": 0, "input_record_count": 0 }, "model_size_stats": { "job_id": "directip_remoteip", "result_type": "model_size_stats", "model_bytes": 0, "total_by_field_count": 0, "total_over_field_count": 0, "total_partition_field_count": 0, "bucket_allocation_failures_count": 0, "memory_status": "ok", "log_time": 1496167391000, "timestamp": -300000 }, "datafeed_config": { "datafeed_id": "datafeed-directip_remoteip", "job_id": "directip_remoteip", "query_delay": "60s", "frequency": "150s", "indexes": [ "sqllogs2" ], "types": [ "DOCUMENT_TYPE" ], "query": { "match_all": { "boost": 1 } }, "scroll_size": 1000, "chunking_config": { "mode": "auto" }, "state": "stopped" }, "state": "opened", "node": { "id": "7Nr2bK0qQRainbrPMvFgBw", "name": "beast", "ephemeral_id": "_kKigoXLT9eY4fXUQx3TKQ", "transport_address": "myip:9300", "attributes": { "ml.enabled": "true" } }, "open_time": "0s" }


(Dimitris Athanasiou) #2

Hi,

It seems like the datafeed could not match any documents.

From the configuration that you posted, one thing that stands out is that the "types" in the "datafeed_config" is set to [ "DOCUMENT_TYPE" ]. The reason it stands out is that "DOCUMENT_TYPE" is an unusual name for a type, so I just wanted to confirm this is indeed the type of your documents in index "sqllogs2".

Another thing to check, is the documents' time range. Are documents "logdate" values within [1970-01-01T00:00:00.000Z, 2017-05-30T19:22:17.001Z) ?

Finally, you can make use of the following end-point to quickly test what will actually be sent to the job:

curl -XGET http://localhost:9200/_xpack/ml/datafeeds/datafeed-directip_remoteip/_preview

Let me know if these helped. I'll be happy to help further if these do not reveal the reason no data is matched.


#3

DOCUMENT_TYPE is what is listed in logstash configuration as far as the document type name, and is the only document type that is in there, it was a default that wasn't changed, but that is correct.

I have confirmed that all the logdates that are included are within that date range.

Below are my results from the curl command. It appears that logstash has defaulted all the fields to type text, since it is dynamically creating the document model in the index. It is odd though as I can preview data in Kibana when looking at the preview section for the advanced job, and it comes back with the total number of hits as there are docs in the index.

curl -XGET http://localhost:9200/_xpack/ml/datafeeds/datafeed-directip_remoteip/_preview

{"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"Fielddata is disabled on text fields by default. Set fielddata=true on [directcallerid] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."}],"type":"search_phase_execution_exception","reason":"all shards failed","phase":"fetch","grouped":true,"failed_shards":[{"shard":0,"index":"sqllogs2","node":"7Nr2bK0qQRainbrPMvFgBw","reason":{"type":"illegal_argument_exception","reason":"Fielddata is disabled on text fields by default. Set fielddata=true on [directcallerid] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."}}]},"status":400}root

Maybe I need to be explicit for the document model for logstash to make the fields keywords?


(Dimitris Athanasiou) #4

Thank you for posting the output of the _preview endpoint here.

Yes, the problem is the mappings. By default, ML is fetching the data via doc_values. Text fields do not have doc_values.

There are 2 solutions to your problem:

  1. You can reindex the data changing logstash to map the fields as keywords. This is harder to do but will result to better ML performance.
  2. Simply edit the JSON of the job and in the "datafeed_config" add: "_source": true. This tells the datafeed to fetch the values from the source document.

#5

Thanks for the help. After seeing the latest output I sent you it made more sense as to what the problem actually was. It would be nice if those types of errors would bubble up into the Machine Learning plugin for Kibana instead of a generic datafeed stopped message.


(Dimitris Athanasiou) #6

Thank you for the feedback, it is very much appreciated. We are working on improving the feedback users get when such errors occur.


(system) #7

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.