ML Datafeed lookback retrieved no data

I am following the use case for Machine Learning for Elastic Stack found at the link below:
Suspicious Login Activity

My setup:
Ubuntu 16.04, Elastic Stack 5.6.8 (Elasticsearch, Logstash, Kibana, Filebeats X-Pack)

I loaded the job by running the ./reset_job.sh suspicious_login_activity script. The links to the repository are below.
reset_job.sh
job.json and data_feed.json

When I go to start the job, it gives me an error:

Datafeed lookback retrieved no data

I have even created this job from scratch.

This is my data showing in the Discover tab.

Below is a copy of the job.json

{
  "job_id": "suspicious_login_activity",
  "description": "suspicious login activity",
  "job_type": "anomaly_detector",
  "analysis_config": {
    "bucket_span": "5m",
    "detectors": [
      {
        "detector_description": "high_count",
        "function": "high_count",
        "partition_field_name": "system.auth.hostname",
        "detector_rules": []
     }
    ],
    "influencers": [
      "system.auth.hostname",
      "system.auth.user",
      "system.auth.ssh.ip"
    ]
  },
  "data_description": {
    "time_field": "@timestamp",
    "time_format": "epoch_ms"
  },
  "model_plot_config": {
      "enabled" : true
  }
}

and the data_feed.json

{
  "datafeed_id": "datafeed-suspicious_login_activity",
  "job_id": "suspicious_login_activity",
  "indexes": [
    "filebeat-*"
  ],
  "types": [
    "doc"
  ],
  "query": {
  "query_string": {
    "query": "system.auth.ssh.event:Failed OR system.auth.ssh.event:Invalid",
    "fields": [],
    "use_dis_max": true,
    "auto_generate_phrase_queries": false,
    "max_determinized_states": 10000,
    "enable_position_increments": true,
    "fuzziness": "AUTO",
    "fuzzy_prefix_length": 0,
    "fuzzy_max_expansions": 50,
    "phrase_slop": 0,
    "analyze_wildcard": true,
    "escape": false,
    "split_on_whitespace": true,
    "boost": 1
      }
  },
  "scroll_size": 1000,
  "query_delay": "60s",
  "frequency": "150s"
}

Also when previewing the datafeed, it is returning

GET _xpack/ml/datafeeds/datafeed-suspicious_login_activity/_preview

I am not understanding how to fix this.

Hi Tabitha,

From your screenshot it appears you have changed the date of the dataset to be March 15th, right?

What you observe here is a timezone issue. In the dataset for this example the timestamps do not include timezone. Elasticsearch interprets such timestamps as UTC. However, kibana converts them to your browser's timezone setting, so it probably looks ok to you. However, the ML datafeeds work with UTC times.

From your first screenshot, you can see the datafeed's end time is: 2018-03-15T15:25:05.001Z. I suspect that the whole dataset's timestamps are after that time in UTC, which is why the datafeed retrieves no data.

I would suggest trying the example without modifying the date, or set the date to be yesterday (enough time for the difference of UTC to your timezone to be covered).

Dmitri,
Thank you so much for your quick response. So I reloaded the data yesterday, which is why the timestamp of 3/15/18, however the original date of the data is 4/19/17 of when it was created (not loaded into ES). I have tried starting the Job from the beginning of the data, I chose the option of 3/14/17, I chose for the end time to be today to cover the the timezone, but it is still giving me the same message

I was evening following this post from the forum because it was simular. I too had my indices having a date pattern. So I followed the steps to resolve, but it did not yield anything different
Security analytics recipes 93591

Tabitha,

I am a bit confused. Looking at the data I can't see how they could be dated 4/19/17. In fact, that example should be updated as the way it currently is (lacking year), it will be interpreted by elasticsearch as being in 2018, thus all timestamps will date in the future. Apologies for the confusion there.

In any case, you should be able to run the datafeed even if it deals with future dates. We just have to give it a suitable end date. Also note you won't be able to run the datafeed in real time mode.

Use discover in order to determine the full range of the data in the index. Note you might have to adjust the date picker to look into the future. The data seems to span 3.5 weeks. Then run the datafeed giving it an end date that includes the end of the data range.

I hope this makes sense. Let me know how it went.

Dmitri,

Please don't laugh....but I have too!!! I do completely understand. So I found using Discover that my time span is Mar 31- April 20. So I have cloned the job to start fresh. I chose the index for the datafeed to be filebeat-*all types. A preview of the data is below:
Screenshot%20from%202018-03-16%2015-29-02
So I just chose a start date of Mar 1 - April 30. But this time span is not fixing the issue.

I have a question. You can see from the first post what my job detectors are. Below is a picture of the available fields that were indexed to ES.


If I just do a query in the Discover tab, looking for the system.auth.hostname, I am not finding this. I am concerned that the data might not of been indexed correctly into ES.

Hi Tabitha,

That would explain a lot! :slight_smile:

One way to test this would be to run the following query:

POST /filebeat-*/_search
{
  "query": {
    "exists": {"field": "system.auth.hostname"}
  }
}

Could you paste the response of this query here?

This the output.

{
  "took": 0,
  "timed_out": false,
  "_shards": {
    "total": 10,
    "successful": 10,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 0,
    "max_score": null,
    "hits": []
  }
}

Right, so that indeed means the data has not been indexed, at least not the way the recipe was expecting.

May I suggest starting clean and repeating the installation & setup section of the recipe?

So I started from a clean installation and setup, but I went back to exactly how the recipe was:
using the ingest-geoip plugin and shipping data from filebeats to elasticsearch

I ran the same the same query again from your last post and the output is below. I did have to cut some of the output out due to the length.

 {
  "took": 20,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 7121,
    "max_score": 1,
    "hits": [
      {
        "_index": "filebeat-2018.03.20",
        "_type": "doc",
        "_id": "AWJEEEPDitLpAGyvfzj3",
        "_score": 1,
        "_source": {
          "@timestamp": "2018-03-27T13:06:56.000Z",
          "system": {
            "auth": {
              "hostname": "ip-10-77-20-248",
              "pid": "1291",
              "program": "sshd",
              "message": "Server listening on 0.0.0.0 port 22.",
              "timestamp": "Mar 27 13:06:56"
            }
          },
          "offset": 81,
          "beat": {
            "hostname": "CASEV-611-ESML",
            "name": "test",
            "version": "5.6.8"
          },
          "input_type": "log",
          "source": "/opt/data/auth.log",
          "fileset": {
            "module": "system",
            "name": "auth"
          },
          "type": "log"
        }
      },
      {
        "_index": "filebeat-2018.03.20",
        "_type": "doc",
        "_id": "AWJEEEPDitLpAGyvfzj6",
        "_score": 1,
        "_source": {
          "@timestamp": "2018-03-27T13:06:56.000Z",
          "system": {
            "auth": {
              "hostname": "ip-10-77-20-248",
              "pid": "1118",
              "program": "systemd-logind",
              "message": "Watching system buttons on /dev/input/event1 (Sleep Button)",
              "timestamp": "Mar 27 13:06:56"
        }
      },
      "offset": 385,
      "beat": {
        "hostname": "CASEV-611-ESML",
        "name": "test",
        "version": "5.6.8"
      },
      "input_type": "log",
      "source": "/opt/data/auth.log",
      "fileset": {
        "module": "system",
        "name": "auth"
      },
      "type": "log"
    }
  },
  {
    "_index": "filebeat-2018.03.20",
    "_type": "doc",
    "_id": "AWJEEEPDitLpAGyvfzkB",
    "_score": 1,
    "_source": {
      "@timestamp": "2018-03-27T13:09:37.000Z",
      "system": {
        "auth": {
          "hostname": "ip-10-77-20-248",
          "program": "sudo",
          "message": "pam_unix(sudo:session): session opened for user root by ubuntu(uid=0)",
          "timestamp": "Mar 27 13:09:37"
        }
      },
      "offset": 1236,
      "beat": {
        "hostname": "CASEV-611-ESML",
        "name": "test",
        "version": "5.6.8"
      },
      "input_type": "log",
      "source": "/opt/data/auth.log",
      "fileset": {
        "module": "system",
        "name": "auth"
      },
      "type": "log"
    }
  },
  {
    "_index": "filebeat-2018.03.20",
    "_type": "doc",
    "_id": "AWJEEEPDitLpAGyvfzkG",
    "_score": 1,
    "_source": {
      "@timestamp": "2018-03-27T13:10:14.000Z",
      "system": {
        "auth": {
          "hostname": "ip-10-77-20-248",
          "sudo": {
            "tty": "pts/0",
            "pwd": "/home/ubuntu",
            "user": "root",
            "command": "/usr/bin/apt-get install apt-transport-https"
          },
          "user": "ubuntu",
          "timestamp": "Mar 27 13:10:14"
        }
      },
      "offset": 1794,
      "beat": {
        "hostname": "CASEV-611-ESML",
        "name": "test",
        "version": "5.6.8"
      },
      "input_type": "log",
      "source": "/opt/data/auth.log",
      "fileset": {
        "module": "system",
        "name": "auth"
      },
      "type": "log"
    }
  },
  {
    "_index": "filebeat-2018.03.20",
    "_type": "doc",
    "_id": "AWJEEEPDitLpAGyvfzkK",
    "_score": 1,
    "_source": {
      "@timestamp": "2018-03-27T13:10:18.000Z",
      "system": {
        "auth": {
          "hostname": "ip-10-77-20-248",
          "program": "sudo",
          "message": "pam_unix(sudo:session): session opened for user root by ubuntu(uid=0)",
          "timestamp": "Mar 27 13:10:18"
        }
      },
      "offset": 2258,
      "beat": {
        "hostname": "CASEV-611-ESML",
        "name": "test",
        "version": "5.6.8"
      },
      "input_type": "log",
      "source": "/opt/data/auth.log",
      "fileset": {
        "module": "system",
        "name": "auth"
      },
      "type": "log"
    }
  },
  {
    "_index": "filebeat-2018.03.20",
    "_type": "doc",
    "_id": "AWJEEEPDitLpAGyvfzkO",
    "_score": 1,
    "_source": {
      "@timestamp": "2018-03-27T13:10:28.000Z",
      "system": {
        "auth": {
          "hostname": "ip-10-77-20-248",
          "program": "sudo",
          "message": "pam_unix(sudo:session): session closed for user root",
          "timestamp": "Mar 27 13:10:28"
        }
      },
      "offset": 2672,
      "beat": {
        "hostname": "CASEV-611-ESML",
        "name": "test",
        "version": "5.6.8"
      },
      "input_type": "log",
      "source": "/opt/data/auth.log",
      "fileset": {
        "module": "system",
        "name": "auth"
      },
      "type": "log"
    }
  },
  {
    "_index": "filebeat-2018.03.20",
    "_type": "doc",
    "_id": "AWJEEEPDitLpAGyvfzkT",
    "_score": 1,
    "_source": {
      "@timestamp": "2018-03-27T13:10:53.000Z",
      "system": {
        "auth": {
          "hostname": "ip-10-77-20-248",
          "program": "sudo",
          "message": "pam_unix(sudo:session): session opened for user root by ubuntu(uid=0)",
          "timestamp": "Mar 27 13:10:53"
        }
         .....

So I was able to run the ML job. It processed 810 records. However, when I went to the Anomaly Explorer Tab, I was not seeing anything. I had to adjust the time frame for the future again. Thank you for your help!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.