ML What is the difference between by_field_name and partition_field_name

  • Both methods split data to establish separate baselines.
  • Can be used separately or applied together in one detector (i.e. count by error_type partition_field=host)

If you want to “hard split” the analysis, select an “partition_field_name”

  • The field chosen should have < 10,000 distinct values per job, in general as more memory is required to partition
  • Each instance of the field is like an independent variable
  • Scoring of anomalies is more independent

If you want a “soft split”, select a “by_field_name”

  • The field chosen should have <100,000 distinct values per job, in general
  • More appropriate for attributes of an entity (dependent variables)
  • Scoring considers history of other by-fields
2 Likes