ML What is the difference between by_field_name and partition_field_name

In detector object configuration we have two fields
by_field_name: used to split the data
partition_field_name : used to segment the analyses

what is the difference between them?

Thanks in advance

1 Like
  • Both methods split data to establish separate baselines.
  • Can be used separately or applied together in one detector (i.e. count by error_type partition_field=host)

If you want to “hard split” the analysis, select an “partition_field_name”

  • The field chosen should have < 10,000 distinct values per job, in general as more memory is required to partition
  • Each instance of the field is like an independent variable
  • Scoring of anomalies is more independent

If you want a “soft split”, select a “by_field_name”

  • The field chosen should have <100,000 distinct values per job, in general
  • More appropriate for attributes of an entity (dependent variables)
  • Scoring considers history of other by-fields

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.