Security Analytics Recipes


(senthil) #1

Hi,
I am trying to explore the example https://github.com/elastic/examples/blob/master/Machine%20Learning/Security%20Analytics%20Recipes/dns_data_exfiltration/EXAMPLE.md

for Anomaly detection. However, I can't seem to get the results in the explorer. The datafeed job has been running for about a day now and it has processed about 32k records. I have followed all the instructions in that github page but I am not sure how long this job needs to run to get a result in the Anomaly explorer for exploration purpose. Can someone please help?

Thanks,
Senthil.


ML Datafeed lookback retrieved no data
(Sophie Chang) #2

Hi

Missing anomaly results could be due to a few factors. The following checks cover the most common causes...

  1. Check for sufficient data

For typical data, the job needs to run for at least 2 hours or 20 buckets (whichever is longer) for the model to initialized and results to be written - I would expect sufficient run time in this instance.

  1. Check results exist

The Anomaly Explorer shows anomalies. As it is possible that the data does not contain anomalies for this time period, then check if any results are being written to elasticsearch. In Dev Tools run:

GET _xpack/ml/anomaly_detectors/<job_id>/results/buckets

One document should be returned per bucket time stamp, even if there are no anomalies found. Check the time stamps for results found match the Kibana time picker.

  1. Check input data

The input data fields may be missing. The processing may be processing records, but there may be many missing values which mean no data is being modelled and consequently there are no results.

To check what the data feed is sending for analysis, run:

GET _xpack/ml/datafeeds/datafeed-<job_id>/_preview

Check that values are returned for all of the fields that are used in the job configuration.

If all of the above checks seem OK, then please send us the job info. In the Job Management page, expand the job entry and select the JSON tab. Please include this content for us to see.

Thanks


(senthil) #3

Hi,
Thanks your help. Here is the JSON.

{
"job_id": "suspicious_login_activity",
"job_type": "anomaly_detector",
"description": "suspicious login activity",
"create_time": 1501018984377,
"finished_time": 1501019008271,
"analysis_config": {
"bucket_span": "5m",
"detectors": [
{
"detector_description": "high_count",
"function": "high_count",
"partition_field_name": "system.auth.hostname",
"detector_rules": []
}
],
"influencers": [
"system.auth.hostname",
"system.auth.user",
"system.auth.ssh.ip"
]
},
"data_description": {
"time_field": "@timestamp",
"time_format": "epoch_ms"
},
"model_plot_config": {
"enabled": true
},
"model_snapshot_retention_days": 1,
"results_index_name": "shared",
"data_counts": {
"job_id": "suspicious_login_activity",
"processed_record_count": 0,
"processed_field_count": 0,
"input_bytes": 0,
"input_field_count": 0,
"invalid_date_count": 0,
"missing_field_count": 0,
"out_of_order_timestamp_count": 0,
"empty_bucket_count": 0,
"sparse_bucket_count": 0,
"bucket_count": 0,
"input_record_count": 0
},
"model_size_stats": {
"job_id": "suspicious_login_activity",
"result_type": "model_size_stats",
"model_bytes": 0,
"total_by_field_count": 0,
"total_over_field_count": 0,
"total_partition_field_count": 0,
"bucket_allocation_failures_count": 0,
"memory_status": "ok",
"log_time": 1501019007000,
"timestamp": -300000
},
"datafeed_config": {
"datafeed_id": "datafeed-suspicious_login_activity",
"job_id": "suspicious_login_activity",
"query_delay": "60s",
"frequency": "150s",
"indexes": [
"filebeat"
],
"types": [
"doc"
],
"query": {
"query_string": {
"query": "system.auth.ssh.event:Failed OR system.auth.ssh.event:Invalid",
"fields": [],
"use_dis_max": true,
"tie_breaker": 0,
"default_operator": "or",
"auto_generate_phrase_queries": false,
"max_determinized_states": 10000,
"enable_position_increments": true,
"fuzziness": "AUTO",
"fuzzy_prefix_length": 0,
"fuzzy_max_expansions": 50,
"phrase_slop": 0,
"analyze_wildcard": true,
"escape": false,
"split_on_whitespace": true,
"boost": 1
}
},
"scroll_size": 1000,
"chunking_config": {
"mode": "auto"
},
"state": "stopped"
},
"state": "opened",
"node": {
"id": "jEnDRnUqSt22_CsGI9BzSA",
"name": "Oculus",
"ephemeral_id": "QKO13r64Sv-dN0etmOfNag",
"transport_address": "x.x.x.x:9300",
"attributes": {
"ml.enabled": "true"
}
},
"open_time": "0s"
}
Basically we are trying to implement this recipe here.

The data has been ingested to ES but we are not able to get the 'datafeed' process the index.

Thanks again.


(Sophie Chang) #4

Hi

From the details you have provided, the datafeed looks like it is currently stopped and zero records have been processed. I presume this is a config from a recently created job.

Can you please confirm what the output from the following looked like:

GET _xpack/ml/datafeeds/datafeed-suspicious_login_activity/_preview

I would expect this to contain data for the following fields:

...
  {
    "@timestamp": 1454803200000,
    "system.auth.hostname": "hostname1",
    "system.auth.user": "thomas",
    "system.auth.ssh.ip": "10.2.3.14"
  },
...

Was this job running in real-time, or did you select a start date that was historical e.g. something like 2 weeks ago.

If you are running in real-time, then the datafeed is running every 150s and selecting data from greater 60s ago. If your data takes longer to ingest than 60s, then you'll need to adjust some of the datafeed settings. You can do this in the UI. In Job Management, select the Edit icon for this job, and click on the Datafeed tab. Then edit:

"query_delay": "60s",
"frequency": "150s",

(You'll need to stop the datafeed first).

If you were running on historical data, (i.e. you selected a datafeed start time that was fairly far in the past), then the datafeed preview should hold the answer. Are the correct fields being returned for analysis?

Regards


(senthil) #5

Hi,
Thanks so much for your response.

How exactly should the indexes in ES be created . I don't see these fields in ES.

{
"@timestamp": 1454803200000,
"system.auth.hostname": "hostname1",
"system.auth.user": "thomas",
"system.auth.ssh.ip": "10.2.3.14"
},

The provided sample dataset auth.log doesn't quite match these fields after being ingested through filebeat. I am running on historical data based on the dataset from the security recipe.


(Sophie Chang) #6

Could you please check on the names of the filebeat indices in your elasticsearch instance?

GET _cat/indices/fileb*

It is possible that the filebeat indices have the date pattern in the name e.g. filebeat-2017.07.27. If this is the case, then I see a problem with the datafeed configuration.

The config below:

"indexes": [
"filebeat"
],

... should be:

"indexes": [
"filebeat-*"
],

To correct this, using the UI, clone the ML job.

  • In the Job Details tab
    -- Give it a new distinct name e.g. suspicious_login_activity_2
    -- Click on Use dedicated index (this is not strictly necessary, but may avoid other issues for the purposes of troubleshooting)
  • Go to the Datafeed tab
    -- Change the Index from filebeat to filebeat-*. Make sure the Time-field name is selected as well as All types. e.g.
    image
    (Note: You may have different values for time-field name and a different list of types - just pick from the list that is pre-populated).
  • Go to the Datafeed Preview tab - here you should be able to see a sample of the data to be analyzed.
  • Click Save
  • Click Start datafeed
    -- Select to run from the beginning of the data to Now (or continue in real-time)

Hope this gets you a little closer. I'll ask the Examples team to double check the recipe is working end to end. It really should be updated to 5.5 by now.

Regards
Sophie


(system) #7

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.