If you want to “hard split” the analysis, select a “partition_field_name”
- The field chosen should have < 10,000 distinct values per job, in general as more memory is required to partition
- Each instance of the field is like an independent variable
- Scoring of anomalies for partitions is more independent (especially v6.5+)
If you want a “soft split”, select a “by_field_name”
- The field chosen should have <100,000 distinct values per job, in general
- More appropriate for attributes of an entity (dependent variables)
- Scoring considers the history of other by-fields
In the Advanced job, you can use both, by the way - effectively getting a double-split. For example:
count by error_code partition=host