Anomaly detection Population Job

country state child_count

a a_s1 302
a a_s2 310
a a_s3 308
a a_s4 21
b b_s1 14
b b_s2 16
b b_s3 17
b b_s4 218

I have a population anomaly detection job to find the anomalies in child_count. The Table shown above is the type of data which we are processing. Here I want to find the anomalies in child_count across each state. Here as you can see for the country 'a' I have child_count of the range 300 and one with child_count '21' which can be treated as an anomlay(compairing it with other values of country 'a') and for country 'b' we have child_count of range 14 to 17 and one with child_count 218 which is also an anomaly.There is no other anomalies in this case. But after processing the data using population job where the entire data is splitted by 'state', it considers the entire data of one country as anomalies by comparing it with first country. But I don't want to compare
child_count of one country with another I just want to compare it with the previous child_count of the same country. How can I achieve this
(The actual data contains high cardinality value that's why I used population job here)

First and foremost - does your data also include a timestamp? If the data isn't really temporal in nature, then you should consider doing an Outlier Detection analysis rather than a Population Analysis.

My data include timestamp . So can I make use of outlier detection

please reply @richcollier

It depends. Outlier detection is analysis mostly irrespective of time. Your data can be data from a certain time period (i.e. House sales prices from 2022).

But, population analysis is meant to be a moment-by-moment analysis (essentially comparing every entity witnessed in an arbitrary time window - as in "last hour" or "last day") and comparing those entities against a learned "global" model of all entities that has been built up over time (ever since the Population Analysis job has been running).

ok thanks. Can I use multiple arguments inside "by_field_name " as I want to split the data based on 3 field values(state,country,district) and analyze the splits with respect to their own history in population job.

"detectors": [
{
"detector_description": "min(child_count) by "country.keyword"",
"function": "min",
"field_name": "child_count",
"by_field_name": ["country.keyword,state.keyword,district.keyword"],
"detector_index": 0
}
]

But the above one throwing an error. How can I give multiple argument and split the analysis based on that

There are two fields that allow splits:

partition_field_name
by_field_name

But, if you need to split more, you'll have to rely on some other methods. Namely, you'll have to create a runtime field that is a concatenation of two (or more) fields - something like: Concatenate two fields by elastic content share

Then choose the runtime field as the field to split on. Word of caution: don't split the data too thin - you might wind up with a very small number of unique combinations and thus have sparse data.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.