Anomaly detection for web site visitor surge (sparse data)

J_Reinhardt · December 16, 2024, 4:45pm

I'm trying to use elastic anomaly detection to identify when a surge in website visitor activity occurs for a particular IP address.

The data set is really simple, it just shows the number of times an IP address has visited the site each hour
_time,src,visits
2024-09-27T01:00:00.000-0400,199.58.35.150,15
2024-09-27T02:00:00.000-0400,199.58.35.150,25
2024-09-27T03:00:00.000-0400,199.58.35.150,20

So from Sunday September 22 through Sunday September 28, the activity above should be anomalous since the IP ending in .150 visited the site on September 27, but did not have any other visits earlier in the week.

The problem is that the data has too many missing documents to use High mean(visits) in the anomaly detection job, unless I insert billions of extra documents to make the value of visits=0 for every hour and every possible IP address.

Is there another way to make this ML anomaly detection use case work, without having to use a gap_policy or write custom code to fill gaps with zeroes?

richcollier · December 17, 2024, 3:31pm

Analyzing data from high-cardinality entities (like IP addresses) is tricky because:

There are possibly 1M+ entities and that's not easily scalable to analyze every entity over all time
Data may be sparse for entities.
If a particular entity shows up for the first time and does something anomalous immediately, you cannot tell it that behavior is anomalous for that entity because you don't have any prior history of that entity to judge against.

Therefore, look to using Population Analysis instead

Topic		Replies	Views
Anomaly Result Interpretation for Seasonal Data Elasticsearch elastic-stack-machine-learning	4	735	July 31, 2020
Anomaly filter Elasticsearch elastic-stack-machine-learning	4	721	October 29, 2018
Help for select the best solution for data analyse Elasticsearch es-hadoop	2	1031	July 6, 2017
Machine Learning handling special cases Elasticsearch elastic-stack-machine-learning	3	806	August 4, 2017
Anomaly Detection on high-dimensionality data Elasticsearch elastic-stack-machine-learning	2	695	July 5, 2017

Anomaly detection for web site visitor surge (sparse data)

Related topics