for Anomaly detection. However, I can't seem to get the results in the explorer. The datafeed job has been running for about a day now and it has processed about 32k records. I have followed all the instructions in that github page but I am not sure how long this job needs to run to get a result in the Anomaly explorer for exploration purpose. Can someone please help?
Missing anomaly results could be due to a few factors. The following checks cover the most common causes...
Check for sufficient data
For typical data, the job needs to run for at least 2 hours or 20 buckets (whichever is longer) for the model to initialized and results to be written - I would expect sufficient run time in this instance.
Check results exist
The Anomaly Explorer shows anomalies. As it is possible that the data does not contain anomalies for this time period, then check if any results are being written to elasticsearch. In Dev Tools run:
One document should be returned per bucket time stamp, even if there are no anomalies found. Check the time stamps for results found match the Kibana time picker.
Check input data
The input data fields may be missing. The processing may be processing records, but there may be many missing values which mean no data is being modelled and consequently there are no results.
To check what the data feed is sending for analysis, run:
Check that values are returned for all of the fields that are used in the job configuration.
If all of the above checks seem OK, then please send us the job info. In the Job Management page, expand the job entry and select the JSON tab. Please include this content for us to see.
Was this job running in real-time, or did you select a start date that was historical e.g. something like 2 weeks ago.
If you are running in real-time, then the datafeed is running every 150s and selecting data from greater 60s ago. If your data takes longer to ingest than 60s, then you'll need to adjust some of the datafeed settings. You can do this in the UI. In Job Management, select the Edit icon for this job, and click on the Datafeed tab. Then edit:
(You'll need to stop the datafeed first).
If you were running on historical data, (i.e. you selected a datafeed start time that was fairly far in the past), then the datafeed preview should hold the answer. Are the correct fields being returned for analysis?
Could you please check on the names of the filebeat indices in your elasticsearch instance?
It is possible that the filebeat indices have the date pattern in the name e.g. filebeat-2017.07.27. If this is the case, then I see a problem with the datafeed configuration.
The config below:
... should be:
To correct this, using the UI, clone the ML job.
In the Job Details tab
-- Give it a new distinct name e.g. suspicious_login_activity_2
-- Click on Use dedicated index(this is not strictly necessary, but may avoid other issues for the purposes of troubleshooting)
Go to the Datafeed tab
-- Change the Index from filebeat to filebeat-*. Make sure the Time-field name is selected as well as All types. e.g.
(Note: You may have different values for time-field name and a different list of types - just pick from the list that is pre-populated).
Go to the Datafeed Preview tab - here you should be able to see a sample of the data to be analyzed.
Click Start datafeed
-- Select to run from the beginning of the data to Now (or continue in real-time)
Hope this gets you a little closer. I'll ask the Examples team to double check the recipe is working end to end. It really should be updated to 5.5 by now.