Security Analytics Recipes

senthil_blue · July 18, 2017, 1:14pm

Hi,
I am trying to explore the example https://github.com/elastic/examples/blob/master/Machine%20Learning/Security%20Analytics%20Recipes/dns_data_exfiltration/EXAMPLE.md

for Anomaly detection. However, I can't seem to get the results in the explorer. The datafeed job has been running for about a day now and it has processed about 32k records. I have followed all the instructions in that github page but I am not sure how long this job needs to run to get a result in the Anomaly explorer for exploration purpose. Can someone please help?

Thanks,
Senthil.

sophie_chang · July 19, 2017, 12:26pm

Hi

Missing anomaly results could be due to a few factors. The following checks cover the most common causes...

Check for sufficient data

For typical data, the job needs to run for at least 2 hours or 20 buckets (whichever is longer) for the model to initialized and results to be written - I would expect sufficient run time in this instance.

Check results exist

The Anomaly Explorer shows anomalies. As it is possible that the data does not contain anomalies for this time period, then check if any results are being written to elasticsearch. In Dev Tools run:

GET _xpack/ml/anomaly_detectors/<job_id>/results/buckets

One document should be returned per bucket time stamp, even if there are no anomalies found. Check the time stamps for results found match the Kibana time picker.

Check input data

The input data fields may be missing. The processing may be processing records, but there may be many missing values which mean no data is being modelled and consequently there are no results.

To check what the data feed is sending for analysis, run:

GET _xpack/ml/datafeeds/datafeed-<job_id>/_preview

Check that values are returned for all of the fields that are used in the job configuration.

If all of the above checks seem OK, then please send us the job info. In the Job Management page, expand the job entry and select the JSON tab. Please include this content for us to see.

Thanks

senthil_blue · July 26, 2017, 4:47pm

Hi,
Thanks your help. Here is the JSON.

{
"job_id": "suspicious_login_activity",
"job_type": "anomaly_detector",
"description": "suspicious login activity",
"create_time": 1501018984377,
"finished_time": 1501019008271,
"analysis_config": {
"bucket_span": "5m",
"detectors": [
{
"detector_description": "high_count",
"function": "high_count",
"partition_field_name": "system.auth.hostname",
"detector_rules": []
}
],
"influencers": [
"system.auth.hostname",
"system.auth.user",
"system.auth.ssh.ip"
]
},
"data_description": {
"time_field": "@timestamp",
"time_format": "epoch_ms"
},
"model_plot_config": {
"enabled": true
},
"model_snapshot_retention_days": 1,
"results_index_name": "shared",
"data_counts": {
"job_id": "suspicious_login_activity",
"processed_record_count": 0,
"processed_field_count": 0,
"input_bytes": 0,
"input_field_count": 0,
"invalid_date_count": 0,
"missing_field_count": 0,
"out_of_order_timestamp_count": 0,
"empty_bucket_count": 0,
"sparse_bucket_count": 0,
"bucket_count": 0,
"input_record_count": 0
},
"model_size_stats": {
"job_id": "suspicious_login_activity",
"result_type": "model_size_stats",
"model_bytes": 0,
"total_by_field_count": 0,
"total_over_field_count": 0,
"total_partition_field_count": 0,
"bucket_allocation_failures_count": 0,
"memory_status": "ok",
"log_time": 1501019007000,
"timestamp": -300000
},
"datafeed_config": {
"datafeed_id": "datafeed-suspicious_login_activity",
"job_id": "suspicious_login_activity",
"query_delay": "60s",
"frequency": "150s",
"indexes": [
"filebeat"
],
"types": [
"doc"
],
"query": {
"query_string": {
"query": "system.auth.ssh.event:Failed OR system.auth.ssh.event:Invalid",
"fields": [],
"use_dis_max": true,
"tie_breaker": 0,
"default_operator": "or",
"auto_generate_phrase_queries": false,
"max_determinized_states": 10000,
"enable_position_increments": true,
"fuzziness": "AUTO",
"fuzzy_prefix_length": 0,
"fuzzy_max_expansions": 50,
"phrase_slop": 0,
"analyze_wildcard": true,
"escape": false,
"split_on_whitespace": true,
"boost": 1
}
},
"scroll_size": 1000,
"chunking_config": {
"mode": "auto"
},
"state": "stopped"
},
"state": "opened",
"node": {
"id": "jEnDRnUqSt22_CsGI9BzSA",
"name": "Oculus",
"ephemeral_id": "QKO13r64Sv-dN0etmOfNag",
"transport_address": "x.x.x.x:9300",
"attributes": {
"ml.enabled": "true"
}
},
"open_time": "0s"
}
Basically we are trying to implement this recipe here.

github.com

elastic/examples/blob/master/Machine Learning/Security Analytics Recipes/suspicious_login_activity/EXAMPLE.md

# Suspicious Login Activity (Volume) - Example

## Overview

In order to test and evaluate this recipe, a test dataset is provided in the form of linux authorization logs collected over a 3 week period. This dataset, which contains a suspicious signature, is indexed to Elasticsearch using a Filebeat equipped with the System module.  

This ML recipe and Filebeat configuration can be applied to any linux system with authorization logs. Further details on using the Filebeat module to index authorization logs can be found [here](https://www.elastic.co/blog/grokking-the-linux-authorization-logs) .

## Pre-requisites

- Filebeat v5.4
- Elasticsearch v5.4
- [ingest-geoip plugin](https://www.elastic.co/guide/en/elasticsearch/plugins/master/ingest-geoip.html)
- X-Pack v5.4 with Machine Learning
- curl

## Recipe Components

This example includes:

This file has been truncated. show original

The data has been ingested to ES but we are not able to get the 'datafeed' process the index.

Thanks again.

sophie_chang · July 27, 2017, 5:45pm

Hi

From the details you have provided, the datafeed looks like it is currently stopped and zero records have been processed. I presume this is a config from a recently created job.

Can you please confirm what the output from the following looked like:

GET _xpack/ml/datafeeds/datafeed-suspicious_login_activity/_preview

I would expect this to contain data for the following fields:

...
  {
    "@timestamp": 1454803200000,
    "system.auth.hostname": "hostname1",
    "system.auth.user": "thomas",
    "system.auth.ssh.ip": "10.2.3.14"
  },
...

Was this job running in real-time, or did you select a start date that was historical e.g. something like 2 weeks ago.

If you are running in real-time, then the datafeed is running every 150s and selecting data from greater 60s ago. If your data takes longer to ingest than 60s, then you'll need to adjust some of the datafeed settings. You can do this in the UI. In Job Management, select the Edit icon for this job, and click on the Datafeed tab. Then edit:

"query_delay": "60s",
"frequency": "150s",

(You'll need to stop the datafeed first).

If you were running on historical data, (i.e. you selected a datafeed start time that was fairly far in the past), then the datafeed preview should hold the answer. Are the correct fields being returned for analysis?

Regards

senthil_blue · July 27, 2017, 8:30pm

Hi,
Thanks so much for your response.

How exactly should the indexes in ES be created . I don't see these fields in ES.

{
"@timestamp": 1454803200000,
"system.auth.hostname": "hostname1",
"system.auth.user": "thomas",
"system.auth.ssh.ip": "10.2.3.14"
},

The provided sample dataset auth.log doesn't quite match these fields after being ingested through filebeat. I am running on historical data based on the dataset from the security recipe.

github.com

elastic/examples/blob/master/Machine Learning/Security Analytics Recipes/suspicious_login_activity/data/auth.log

Mar 27 13:06:56 ip-10-77-20-248 sshd[1291]: Server listening on 0.0.0.0 port 22.
Mar 27 13:06:56 ip-10-77-20-248 sshd[1291]: Server listening on :: port 22.
Mar 27 13:06:56 ip-10-77-20-248 systemd-logind[1118]: Watching system buttons on /dev/input/event0 (Power Button)
Mar 27 13:06:56 ip-10-77-20-248 systemd-logind[1118]: Watching system buttons on /dev/input/event1 (Sleep Button)
Mar 27 13:06:56 ip-10-77-20-248 systemd-logind[1118]: New seat seat0.
Mar 27 13:08:09 ip-10-77-20-248 sshd[1361]: Accepted publickey for ubuntu from 85.245.107.41 port 54259 ssh2: RSA SHA256:Kl8kPGZrTiz7g4FO1hyqHdsSBBb5Fge6NWOobN03XJg
Mar 27 13:08:09 ip-10-77-20-248 sshd[1361]: pam_unix(sshd:session): session opened for user ubuntu by (uid=0)
Mar 27 13:08:09 ip-10-77-20-248 systemd: pam_unix(systemd-user:session): session opened for user ubuntu by (uid=0)
Mar 27 13:08:09 ip-10-77-20-248 systemd-logind[1118]: New session 1 of user ubuntu.
Mar 27 13:09:37 ip-10-77-20-248 sudo:   ubuntu : TTY=pts/0 ; PWD=/home/ubuntu ; USER=root ; COMMAND=/usr/bin/curl -L -O https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-5.2.2-amd64.deb
Mar 27 13:09:37 ip-10-77-20-248 sudo: pam_unix(sudo:session): session opened for user root by ubuntu(uid=0)
Mar 27 13:09:38 ip-10-77-20-248 sudo: pam_unix(sudo:session): session closed for user root
Mar 27 13:10:08 ip-10-77-20-248 sudo:   ubuntu : TTY=pts/0 ; PWD=/home/ubuntu ; USER=root ; COMMAND=/usr/bin/apt-key add -
Mar 27 13:10:08 ip-10-77-20-248 sudo: pam_unix(sudo:session): session opened for user root by ubuntu(uid=0)
Mar 27 13:10:09 ip-10-77-20-248 sudo: pam_unix(sudo:session): session closed for user root
Mar 27 13:10:14 ip-10-77-20-248 sudo:   ubuntu : TTY=pts/0 ; PWD=/home/ubuntu ; USER=root ; COMMAND=/usr/bin/apt-get install apt-transport-https
Mar 27 13:10:14 ip-10-77-20-248 sudo: pam_unix(sudo:session): session opened for user root by ubuntu(uid=0)
Mar 27 13:10:14 ip-10-77-20-248 sudo: pam_unix(sudo:session): session closed for user root
Mar 27 13:10:18 ip-10-77-20-248 sudo:   ubuntu : TTY=pts/0 ; PWD=/home/ubuntu ; USER=root ; COMMAND=/usr/bin/tee -a /etc/apt/sources.list.d/elastic-5.x.list
Mar 27 13:10:18 ip-10-77-20-248 sudo: pam_unix(sudo:session): session opened for user root by ubuntu(uid=0)

This file has been truncated. show original

sophie_chang · July 28, 2017, 9:50am

Could you please check on the names of the filebeat indices in your elasticsearch instance?

GET _cat/indices/fileb*

It is possible that the filebeat indices have the date pattern in the name e.g. filebeat-2017.07.27. If this is the case, then I see a problem with the datafeed configuration.

The config below:

"indexes": [
"filebeat"
],

... should be:

"indexes": [
"filebeat-*"
],

To correct this, using the UI, clone the ML job.

In the Job Details tab
-- Give it a new distinct name e.g. suspicious_login_activity_2
-- Click on Use dedicated index (this is not strictly necessary, but may avoid other issues for the purposes of troubleshooting)
Go to the Datafeed tab
-- Change the Index from filebeat to filebeat-*. Make sure the Time-field name is selected as well as All types. e.g.

(Note: You may have different values for time-field name and a different list of types - just pick from the list that is pre-populated).
Go to the Datafeed Preview tab - here you should be able to see a sample of the data to be analyzed.
Click Save
Click Start datafeed
-- Select to run from the beginning of the data to Now (or continue in real-time)

Hope this gets you a little closer. I'll ask the Examples team to double check the recipe is working end to end. It really should be updated to 5.5 by now.

Regards
Sophie

system · August 25, 2017, 9:50am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
ML Datafeed lookback retrieved no data Elasticsearch elastic-stack-machine-learning	10	2524	April 17, 2018
ML - Datafeed is encountering errors extracting data: all shards failed Elasticsearch elastic-stack-machine-learning	24	2844	March 24, 2021
Machine learning datafeed preview return no data Elasticsearch elastic-stack-machine-learning	2	456	April 10, 2020
Why I have results without actual value Elasticsearch elastic-stack-machine-learning	4	639	February 7, 2019
Security Analytics Recipes - DNS Data Exfiltration Elasticsearch elastic-stack-machine-learning	11	1777	May 8, 2020

Security Analytics Recipes

Related topics