Stuck on "going to run"

I have a detection rule that looks for indicator matches on a filebeat-misp index and every so often it will get stuck on "going to run" with a date that is in the past.

If it is failing then I'm none the wiser but I'd expect it to fail and retry but it seems stuck in this phase...running 7.10 on Elastic Cloud...any ideas?

Hi @hilt86,

It would fail after 90 seconds if it cannot complete and show an error in your error history tab. I would check to ensure you are not getting timeouts there.

If you see it getting stuck on "going to run" that usually is an indication that it is currently running but it is taking a very long time to run. We only have the states right now of:

"going to run" | "succeeded" | "failed" | "partial failure"

It doesn't look like we have an actual "running" at the moment. We might change that in the future though if it is causing confusion.

If you do see it beginning to timeout I would look at reducing your threat list size by splitting it apart into a few smaller lists and then making separate rules per list or querying the lists in separate rules using a query criteria to reduce the amount of time it takes to look through the list.

Also if you do see it beginning to timeout you could look at your query against your data set when using the indicator matches to see if possibly you could reduce the amount of data it queries within that block of time.

ok I think you're right it is a timeout but I don't quite understand why - my TI index is tiny ~10MB and looking back 5 minutes in my other indexes isn't taxing either.

I have three other installs so I will see if I can isolate this issue..

@Frank_Hassanabad this just showed up on another install. I've refined the search so that the indicator query filters for event.module: "misp" and the index query filters for the opposite so that is as efficient as I can make the search but there must be something wrong as the indexes are not large by any indication.

Oh hey again @hilt86 :slightly_smiling_face:

In addition the bulk edit issue from the other thread, I created this issue for adding a running state to rules. No need to comment for priority as we'll get this in there as soon as we can since this becomes especially misleading for longer running rules like Indicator match rules. Thanks for raising this!

As for the general performance issues you're seeing, Indicator Match rules are still pretty fresh so we're working with the Elasticsearch team on some performance tweaks, both near-term and long-term, so stay tuned for those in an upcoming release.

Cheers!
Garrett

Hey @spong! Merry Christmas and Happy New year to you all at Elastic!

I've discovered another peculiarity of the indicator match - I have an install that has a list of indicators that matches on destination.ip on the intel index and a auditbeat index.

If i run `nc -v 1.2.3.4 443' on a host in my fleet I see the auditbeat event and matching destination.ip however the intel match doesn't occur (assuming 1.2.3.4 is in the list ofc)

This makes me wonder - what timeframe does the indicator match search within?

what timeframe does the indicator match search within?

The indicator match will go through the entire indicator/threat list using your indicator query as a filter. It will then attempt to match all of those queries against your source indexes given. It uses the mapping between the two indexes you have the UI to try and determine if there is a match or not.

Can you export your rule query for the indicator match and give us the mapping and the sample data set of your Indicator index that has 1.2.3.4 within it?

We might be able to spot something going on.

What is the query? Did you use maxspan=#?