MISP and Elastic Security

I have a use case where I want to index attributes from MISP to Elasticsearch for using the new Threat Matching rule in Elastic Security 7.10.
Everything is working fine but the issue is when I try to ingest more attributes from misp along with the new one it also again ingests the old ones creating huge duplicates so I delete the whole index before ingesting attributes from MISP.
Is there a way where I can prevent the duplicate events to be ingested again?

Couple of ways to do this. For starters how are you ingesting the MISP data? Is it custom code/3rd party code you are using?

I am using the filebeat misp module to ingest data, had to tweak some things in filebeat to make it stable though.

Hello @Ameer_Mukadam, sorry for the slow response, its in the middle of these busy Christmas times :slight_smile:

Let me talk about a few different pointers, so that you have a few choices on which approach you want to go.

In terms of tweaking the MISP module, there is some known comments, requests and reports around the module which I am currently working on and while we cannot talk about in which release this might come up, I wanted to at least clarify it is a known case.

Now in terms of fixing duplicates there is a few ways to go about it, if you want to only ingest each unique event once, you will need to overwrite the default automatic generated document ID, that means if you later try to ingest a duplicate then the document ID will be the same and the event will be dropped.

The way to handle this can be multiple ways, in terms of tweaking have you already been modifying the local javascript that handles the events or have you been focusing on the ingest pipelines? If that is the case, you can choose to set any of the unique ID's in the MISP event to the field "@metadata._id"

If you want to perform some changes in terms of filebeat processors then the easiest way is to use the fingerprint processor to create a hash of one or multiple fields of your choosing that is unique to that event.

Processor example:

      - fingerprint:
          fields: ["json.orgId", "json.created", "json.event"]
          target_field: "@metadata._id"

Let me know how it goes and which way you went! Feedback is always appreciated here so :slight_smile:

1 Like

Hi @Marius_Iversen not a problem I know its the holiday period :wink:

Anyways so the tweak I mentioned was that in misp modules manifest.yml i changed the method type to post instead of GET since it is the recommended way according to MISP's documentation, then I I also increased the timeout period because there are a large number of attributes in the MISP instance and it was failing with the default time out period then I also added the request body so I can pass it to the post request to filter the output from MISP and also added the interval period because right now it pulls in once and then stops.

And yes I was also reading about the fingerprint feature to prevent ingesting duplicates since we are new to elastic was waiting to confirm if I was on the right track, I will test it and confirm.

That is good feedback, much appreciated @Ameer_Mukadam!

We have recently merged a new version of HTTPJson, one that has a lot more of the functionality you are talking about, for example:

  1. Use field from last response in next request (for example remembering even across reboots, the timestamp of the newest event that was already ingested), that way we can ensure that the API calls always only requests data newer than the last.
  2. In terms of how we want to approach MISP for the future release it is much more on how you are describing it, it is about polling in close intervals, only collecting the relevant data and providing a field to use for filtering.

Fingerprint is indeed the way to go, and its how we would approach it usually in modules that currently polls duplicated data.

If you want to get some notifications on how that is going and when it might be closed, you can add a watch/subscribe to this PR: https://github.com/elastic/beats/pull/21795

In terms of the new HTTPJson features, as it is merged into master (does not mean it has a official release yet) you can preview the documentation here in the master branch to take a look at features and new formats: https://www.elastic.co/guide/en/beats/filebeat/master/filebeat-input-httpjson.html

Thanks for all the information @Marius_Iversen I will definitely have a look at it.
I have a small query and would like to get some help on it. The events I am ingesting using the misp module does not have many unique values which i could use for fingerprinting. Can I use IP address field as the fingerprint field? The structure of the events right now is Rule Name will be the Threat feed name and it contains the attributes such ip,hash,domains etc. Multiple attributes can have the same name rule name or rule id so I cannot use that field for fingerprint, the only unique fields per rule are the ip address field or file has fields or url.
Is this the right approach or there is a better way of doing this.

I think when ingesting MISP data, the main focus is commonly on the Attributes, and each Attribute has its own unique ID, so personally I would use its uuid field:

Example MISP data from attributes:

	"Event": {
		"Attribute": {
			"Galaxy": [],
			"ShadowAttribute": [],
			"category": "External analysis",
			"comment": "Carbon sample - Xchecked via VT: a08b8371ead1919500a4759c2f46553620d5a9d9",
			"deleted": false,
			"disable_correlation": false,
			"distribution": "5",
			"event_id": "4",
			"first_seen": null,
			"id": "342",
			"last_seen": null,
			"object_id": "0",
			"object_relation": null,
			"sharing_group_id": "0",
			"timestamp": "1490878550",
			"to_ids": false,
			"type": "link",
			"uuid": "58dd0056-6e74-43d5-b58b-494802de0b81",
			"value": "https://www.virustotal.com/file/7fa4482bfbca550ce296d8e791b1091d60d733ea8042167fd0eb853530584452/analysis/1486030116/"

I've also been tackling this over the past few weeks - I'm using a cronJob to delete the index and re-index https://github.com/hilt86/misp-importer

Thanks for pitching in as well @hilt86 ! I'l make sure we handle this a bit differently in a newer version, so that both filtering and deduplication is handled better.

Okay so I did some more testing and what I can see is the uuid is based on the event and not the attribute so I can have the same uuid for different attributes which are under a single event.

@Ameer_Mukadam Hmm okay, yeah then using the fingerprint processor on the ID field would be better from that standpoint, as it would be unique to each event right? Its a ever growing number. If you have concerns it might duplicate with something else you could choose a second field related to the specific logsource (misp)

I need to check if the id field is different for each attribute or it is different for each event only. This is very confusing I am sorry. Wish there was a way to only pull the new attributes from misp using its api I could have configured that into the filebeat module then.

@Ameer_Mukadam and @hilt86

Before rewriting it into a new module, we have opened a PR that fixes the issues on the existing one, just so we can resolve duplication + only grabbing events that is newer than the last API polling, so it will continuously get only the newest datasets.

Feel free to follow this for updates on when it will be merged: https://github.com/elastic/beats/pull/23070

Awesome - this makes my workaround obsolete which is great! Thanks @Ameer_Mukadam and @Marius_Iversen @marc.guasch et al

1 Like

Hi Marius I was looking at using the fingerprint processor for the “id” field but the misp filebeat module doesn’t output the “id” field is there some other config file where i can add the fingerprint processor for the “id” field.