It seems like m365_defender
creates duplicate events. This seems to be because of non-stable ordering of the document fields. In our data it seems like fields inside the alert
object changes their order and some of the fields in agent
. I think this is the primary issue which causes duplication.
Our idea is to then add some static fingerprinting to the ingest pipeline, and see if that solves the issue. But this isn't super trivial since it seems like Sentinel updates the read object. Thus we have the challenge of which fields to include which would deduplicate the events, but not collide with updated events.
We have two suggestions!
Suggestion 1:
processors:
- fingerprint:
fields: ["microsoft.m365_defender.incidentUri"]
target_field: "@metadata._id"
This would probably cause updated events to collide, so we we probably don't think this is a good idea. Along with the entity types changing.
Suggestion 2
processors:
- fingerprint:
fields:
- "microsoft.m365_defender.incidentUri"
- "microsoft.m365_defender.alerts.providerAlertId"
- "microsoft.m365_defender.alerts.entities.entityType"
target_field: "@metadata._id"
This seems more sane, but the drawback is that we see some events being updated with new values in only one field. This fingerprinting would not allow us to collect the updated events.
It seems like fixing the ordering issue or the root cause of there being ordering issues in the data would be better. But it's hard to figure out where that happens.