Packetbeat Rare DNS Questions ML Job Customization


I have an issue with the packetbeat rare dns question ml job, which generates quite a bit of anomalies due to the fact that our hosts are frequently contacting * url's, which have a random part. For example:

These anomalies are picked up by SIEM and as a SIEM ML Detection has nu way to filter stuff:

I will need to tune or filter the ml job itself.

The query used in the ml job is:


So I'd like to discuss what would the best long term and flexible solution, so I can exclude certain domains when needed, without having to rebuild the ml job.

Some possible solutions:

  • I could filter out * in in the ml datafeed query
  • Even better (so I don't have to use expensive leading wildcard query) I could filter out in dns.question.registered_domain

But both above options would require me to stop the datafeed, job and then update the datafeed query, which is not really user-friendly.

Ideally I'd love to use a whitelist filter list like this:


But dns.question.registered_domain is not an option to scope. Feedback to enable me to dynamically filter on dns.question.registered_domain is welcome.
Or is my only option to update the datafeed query in the ml job?



Have you used the Filter lists from machine learning under settings? That might help you out some with what you're trying to do. I haven't used it directly myself but I hear good things about it from others. It will filter those things out before the anomalies are produced though but to a lot of people that's what they're aiming for:

1 Like

Currently on a holiday, but I'll definitely investigate the filter lists capabilities further. Thanks

So I tried to use the filter list, but it doesn't seem to work as expected..


whitelist_server_domain contains image

But I still encounter anomalies with * url's...

Am I missing something?

Looks like a good reason to open a support ticket

@richcollier Ticket 00614098 has been created. Grtz

@richcollier Just an fyi, I stumbled on this =>

After closing / reopening the job, it works.

While working on this, I got some additional questions.

Is it possible to configure a rule for an ml job before the ml job has been started? For example during creation time or while editing. I'm asking this, because I created a new job from scratch, trying to prevent internal url's and other known domains that should be whitelist to 'pollute' my ml model.

Afaik this is not possible yet. Is this already on Elastic's to do? If not, should I make a GH issue for it?


1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.