Fact 1: Splitting a population job (which defines an over_field) ultimately creates sub-populations
Fact 2: Using a partition_field is a more "hard" split (meant to separate/isolate) from other values - and in population analysis, you may want to isolate sub-populations from each other.
Fact 3: Using by_field is more of a "soft_split" (where values of the by_field are more like attributes of an entity) and anomalies of distinct members of the population are aggregated in such a way that severity of anomalousness for an entity is increased with more simultaneously unusual values for the same member of the population.
For example, imagine a data set:
time,user, gender,feature_name,feature_value
0,Bob,male,age,30
0,Bob,male,weight,175
0,Bob,male,height,75
1, Sakura,female,age,44
1, Sakura,female,weight,105
1, Sakura,female,height,59
...
You could set up an analysis like:
max(feature_value) by_field=feature_name partition_field=gender over_field=user
where:
-
over_field=user- makes sense since we want to model users as members of a population -
by_field=feature_name- age, weight, height are attributes of a particular user -
partition_field=gender- might make sense to isolate genders from each other because (in general) men are generally bigger/heavier than women.