X-Pack Anomaly Detection: multi-metric equals multivariate?

Hi.
I would like to detect anomalies using X-Pack. I have the following data:

  • location
  • user
  • event (1 event is stored in 1 document)
  • user behaviour per event (various features/fields)

I would like to detect anomalous events. Usually an anomaly is not just a single feature (e.g. a spike in a count), but a combination of several features. And it is a user that behaves differently compared to other users at the same location, not changing his own behaviour (that's why I would use "Population" with user as population field - should I split each detector by location or already select my datafeed accordingly and create one job per location?).
So I was wondering if X-Pack models the relation between various features to find anomalies or only looks at each single detector?
I would also like to understand how I could find the event that was anomalous and not just the time bucket?
Thanks
bp

I would also like to understand how I could find the event that was anomalous and not just the time bucket?

The way Elastic anomaly detection works is to create summary statistics per time bucket and detect anomalies in those summary statistics. This is what makes it scalable. It is not designed to tell you that one specific event (document) was the cause.

One thing you can do is to specify influencer fields in your analysis config. These may help to narrow down which event(s) caused the anomaly to be detected.

The workflow we envisage is that once we report an anomaly for a particular time bucket somebody who understands the input data will drill down into it using the time bucket and influencer field values as filters to see the raw document(s) that were responsible.

And it is a user that behaves differently compared to other users at the same location, not changing his own behaviour (that's why I would use "Population" with user as population field

Agreed, user should be the population field, which is over_field_name in the detector config.

should I split each detector by location or already select my datafeed accordingly and create one job per location?

That depends on the data volume. You could specify location as the partition_field_name. Depending on the data volume this might work well, or it might make the job's memory requirement so high that it is better to have one job per location.

So I was wondering if X-Pack models the relation between various features to find anomalies or only looks at each single detector?

Basically it's looking at each individual detector. The anomaly explorer view in the UI will help you see when there are multiple anomalies in the same time bucket.

One other option to consider that will take into account many features together would be to use Elastic data frame analytics to do one outlier detection job for each location, listing all your various features/fields as included fields. There's an end-to-end example with screenshots that might give you a better idea of whether this would be more suitable for your use case than our anomaly detection functionality.

Thank you for your reply.

I was trying to create a new outlier detection job, but I get an error when using an existing index as destination index (validation_exception) and also if I enter a new one (index_not_found_exception). What do you recommend, how should the destination index look like? I also don't see the option for "included fields", but I would use the API here to define them.

Hi @boxplot

I'm not sure what version of elasticsearch you are using, or if you are using the UI or the API, so it's a little difficult to answer your questions directly, but I can provide some background.

By default, data frame analytics will create the destination index using mappings and settings from the source index. This is preferred. The validation exception that you saw is likely to be something to do with invalid mappings being set and can be avoided by just letting dfa create the destination index.

If you are still seeing the index_not_found exception, please include the steps to replicate. Was this on _start or when you created the analytics job. Could this error have related to the source index?

Included fields can be set in the API, or by switching to the Advanced Editor in the UI. In the Advanced Editor UI you can edit JSON directly and use included fields.

Best wishes
Sophie

Hi @sophie_chang

I'm using the UI with Kibana 7.8.0.

After creating the dfa job, I get the following message when I click on view (index_dest_dfa does not exist prior to the job creation):

Not Found: [index_not_found_exception] no such index [index_dest_dfa], with { resource.type="index_or_alias" & resource.id="index_dest_dfa" & index_uuid="_na_" & index="index_dest_dfa" }

Thanks,
boxplot

Please check in DevTools, does index_dest_dfa exists? e.g. try GET _cat/indices

If it does not exist, then it's likely that there was another error message earlier that would have happened when the DFA job was created or started.

You may see a more explanatory message when you try to start the job.

Also, what is the job state? and the job progress?
https://www.elastic.co/guide/en/elasticsearch/reference/7.8/get-dfanalytics-stats.html#ml-get-dfanalytics-stats-example
In order to view results, I would expect the job state to be stopped and the progress to be 100 in all phases.

If there is nothing obvious, could you please try and step through the wizard again and see if there was an earlier error message?

It is quite obvious, there is not enough memory available for the job. Even if I choose a small number of documents, it doesn't run:

Job messages:
Estimated memory usage for this analytics to be [2mb]

No node found to start analytics. Reasons [Not opening job [45698] on node [{xxx}{ml.machine_memory=17061904384}{ml.max_open_jobs=20}], because this node has insufficient available memory. Available memory for ML [5118571315], memory required by existing jobs [0], estimated memory required for this job [16134438912]]

Do I always need to have that amount of memory available, even for a small job?

Hi @boxplot

Were those two messages from the same job? It seems strange to have a 2mb estimate, whereas the error at start seems to indicate a 16gb estimate. A job requiring 2mb should easily fit on that node.

As I am not sure how these errors came about, let me explain some background which I hope will help you when stepping through and creating your analysis.

  1. The job configuration has a setting model_memory_limit. This limits memory used by the job during its analysis and is an upper limit which helps protect node stability. The default is 1gb which is fine for many analysis, especially during ML product evaluations. https://www.elastic.co/guide/en/elasticsearch/reference/7.8/put-dfanalytics.html

  2. To help users know what value to set this to, we also have an explain API which estimates the mem required. https://www.elastic.co/guide/en/elasticsearch/reference/7.8/explain-dfanalytics.html

  3. When using the UI wizard to create a DFA job, in the background we call the explain API and set the job model_memory_limit accordingly. You can overwrite this.

  4. When the job starts, there is an additional backend check against the estimate again. This will stop the job from starting if the model_memory_limit is less than the estimate. (note - we plan to remove this check in a later version because we would prefer not to fail to start the job based on an estimated value).

Here are a few suggestions for next steps:

For outlier detection, a job will typically need more memory if there are many features in the source data. Consider using include or excludes fields to limit analysis to relevant features, if you can. Also potentially down sample your source data. This will allow the analysis to fit within the 15gb (rounded) that appears to be available on the node. Note - a job with a 16GB model estimate is likely to take a few hours to complete. As you are trialing data frame analytics, and iterations are likely, it's best to start small.

It is possible there is something strange happening in the UI workflow which is causing an incompatible estimate to be applied to the job, or perhaps the estimate is not being refreshed after changing some of the wizard fields, or perhaps the backend is producing an inaccurate estimate. When stepping through the wizard, please double check the memory estimate and the value applied for model_memory_limit . The model memory limit needs to be less than 15gb and larger than the estimate for the job to start.

Also, please consider stepping through the outlier example in our documentation. This is a smaller analysis job and if you can successfully step through it, then it will show that there is nothing in your cluster configuration which may be impacting data frame analytics memory management. https://www.elastic.co/guide/en/machine-learning/7.8/ecommerce-outliers.html

Hi @sophie_chang

Yes.

If I try to lower the limit in the UI, the following message appears:
Model memory limit cannot be lower than 15352mb
When using the advanced editor, I was able to lower the limit and the job finished sucessfully.

Thank you for your support!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.