Anomaly detection Population Job

NamithaJ97 · September 6, 2023, 1:44pm

country state child_count

a a_s1 302
a a_s2 310
a a_s3 308
a a_s4 21
b b_s1 14
b b_s2 16
b b_s3 17
b b_s4 218

I have a population anomaly detection job to find the anomalies in child_count. The Table shown above is the type of data which we are processing. Here I want to find the anomalies in child_count across each state. Here as you can see for the country 'a' I have child_count of the range 300 and one with child_count '21' which can be treated as an anomlay(compairing it with other values of country 'a') and for country 'b' we have child_count of range 14 to 17 and one with child_count 218 which is also an anomaly.There is no other anomalies in this case. But after processing the data using population job where the entire data is splitted by 'state', it considers the entire data of one country as anomalies by comparing it with first country. But I don't want to compare
child_count of one country with another I just want to compare it with the previous child_count of the same country. How can I achieve this
(The actual data contains high cardinality value that's why I used population job here)

richcollier · September 6, 2023, 6:06pm

First and foremost - does your data also include a timestamp? If the data isn't really temporal in nature, then you should consider doing an Outlier Detection analysis rather than a Population Analysis.

NamithaJ97 · September 7, 2023, 4:49am

My data include timestamp . So can I make use of outlier detection

NamithaJ97 · September 8, 2023, 7:36am

please reply @richcollier

richcollier · September 11, 2023, 4:38pm

It depends. Outlier detection is analysis mostly irrespective of time. Your data can be data from a certain time period (i.e. House sales prices from 2022).

But, population analysis is meant to be a moment-by-moment analysis (essentially comparing every entity witnessed in an arbitrary time window - as in "last hour" or "last day") and comparing those entities against a learned "global" model of all entities that has been built up over time (ever since the Population Analysis job has been running).

NamithaJ97 · September 15, 2023, 6:05am

ok thanks. Can I use multiple arguments inside "by_field_name " as I want to split the data based on 3 field values(state,country,district) and analyze the splits with respect to their own history in population job.

"detectors": [
{
"detector_description": "min(child_count) by "country.keyword"",
"function": "min",
"field_name": "child_count",
"by_field_name": ["country.keyword,state.keyword,district.keyword"],
"detector_index": 0
}
]

But the above one throwing an error. How can I give multiple argument and split the analysis based on that

richcollier · September 18, 2023, 12:07pm

There are two fields that allow splits:

partition_field_name
by_field_name

But, if you need to split more, you'll have to rely on some other methods. Namely, you'll have to create a runtime field that is a concatenation of two (or more) fields - something like: Concatenate two fields by elastic content share

Then choose the runtime field as the field to split on. Word of caution: don't split the data too thin - you might wind up with a very small number of unique combinations and thus have sparse data.

system · October 16, 2023, 12:08pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Anomaly detection - partitioning data Kibana elastic-stack-machine-learning	6	610	November 3, 2020
Anomalie Detection : Need Help Please Kibana elastic-stack-machine-learning	3	325	July 20, 2021
ML Kibana: problem with an advanced job using partitionfield Kibana elastic-stack-machine-learning	18	1159	September 3, 2019
Conflicting data between Index Dashboard and ML anomaly detection job Kibana elastic-stack-machine-learning	7	486	February 16, 2021
Anomaly Detection Kibana skipping data Kibana elastic-stack-machine-learning	12	895	July 15, 2020

Anomaly detection Population Job

Related topics