Duplicate events ingested by m365_defender module

It seems like m365_defender creates duplicate events. This seems to be because of non-stable ordering of the document fields. In our data it seems like fields inside the alert object changes their order and some of the fields in agent. I think this is the primary issue which causes duplication.

Our idea is to then add some static fingerprinting to the ingest pipeline, and see if that solves the issue. But this isn't super trivial since it seems like Sentinel updates the read object. Thus we have the challenge of which fields to include which would deduplicate the events, but not collide with updated events.

We have two suggestions!

Suggestion 1:

  - fingerprint:
      fields: ["microsoft.m365_defender.incidentUri"]
      target_field: "@metadata._id"

This would probably cause updated events to collide, so we we probably don't think this is a good idea. Along with the entity types changing.

Suggestion 2

  - fingerprint:
        - "microsoft.m365_defender.incidentUri"
        - "microsoft.m365_defender.alerts.providerAlertId"
        - "microsoft.m365_defender.alerts.entities.entityType"
      target_field: "@metadata._id"

This seems more sane, but the drawback is that we see some events being updated with new values in only one field. This fingerprinting would not allow us to collect the updated events.

It seems like fixing the ordering issue or the root cause of there being ordering issues in the data would be better. But it's hard to figure out where that happens.

Thanks for the feedback @Foxboron! And welcome to the community! :slightly_smiling_face: :wave:

I've forwarded along your feedback to the team, and also added it to this issue for migrating the M365 Defender Module to an Elastic Agent Integration. Perhaps this can be addressed as part of that effort.


This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.