ML: difference between partition_field_name and by_field_name in a population job?

richcollier · November 1, 2021, 6:39pm

Fact 1: Splitting a population job (which defines an over_field) ultimately creates sub-populations
Fact 2: Using a partition_field is a more "hard" split (meant to separate/isolate) from other values - and in population analysis, you may want to isolate sub-populations from each other.
Fact 3: Using by_field is more of a "soft_split" (where values of the by_field are more like attributes of an entity) and anomalies of distinct members of the population are aggregated in such a way that severity of anomalousness for an entity is increased with more simultaneously unusual values for the same member of the population.

For example, imagine a data set:

time,user, gender,feature_name,feature_value
0,Bob,male,age,30
0,Bob,male,weight,175
0,Bob,male,height,75
1, Sakura,female,age,44
1, Sakura,female,weight,105
1, Sakura,female,height,59
...

You could set up an analysis like:
max(feature_value) by_field=feature_name partition_field=gender over_field=user

where:

over_field=user - makes sense since we want to model users as members of a population
by_field=feature_name - age, weight, height are attributes of a particular user
partition_field=gender - might make sense to isolate genders from each other because (in general) men are generally bigger/heavier than women.

Topic		Replies	Views
ML Kibana: difference between by_field_name and partition_field_name Kibana elastic-stack-machine-learning	4	2914	August 29, 2019
ML What is the difference between by_field_name and partition_field_name Elasticsearch elastic-stack-machine-learning	2	2475	December 27, 2017
ML Kibana: problem with an advanced job using partitionfield Kibana elastic-stack-machine-learning	18	1236	September 3, 2019
ML: difference between partition_field_name and by_field_name? Elasticsearch elastic-stack-machine-learning	4	1019	August 27, 2021
Question on how to create a simple ML job Elasticsearch elastic-stack-machine-learning	12	1206	October 29, 2018

ML: difference between partition_field_name and by_field_name in a population job?

Related topics