Threshold rules not triggering on selfmade index

Hello,

I am currently trying to create a simple threshold rule that should trigger after x amounts of failed logons.
The issue here is that the rule does not trigger at all, not even when the threshold is set to 1.

This is my rule

The rule is set to run every 5 minutes with a 1 minute lookback (these are the default settings)

I absolutely do have events that match the rule, here is a screenshot of my discover tab and the dev tools output.

Discover tab

Dev tools

For the record: I have tried it with user.name.keyword and with user.name - in the past my rules threw an error when I didn't use the .keyword field due to aggregating problems, now it doesn't happen anymore. I have changed nothing about the index settings or its mapping.
Doesn't matter which field I use (user.name or user.name.keyword) no error happens and no signal is being created.

Hello @madduck,

Thanks for trying it out.

I saw the image attached, have you tried using user.name.keyword when setting up the field of the threshold? That's what I've found different from the query you have in debug tool.

Bare with me, just found that you have the same result of with or without keyword.

What about if reducing the filter condition one by one each time and see if the result changes accordingly?

Hi Angela,

thanks for reaching out.
I have reduced the filter statements one by one but to no avail.

I even tried multiple different aggregation fields and none of them worked.

Could there be some underlying problem with my index? I checked if the user.name.keyword field is populated and it is. Furthermore it is marked as aggregatable in my Kibana UI.

mmm, Ok, may I have the Kibana version number that you're using?

The whole stack runs on 7.9.0

1 Like

Do you have the @timestamp field in your index? It seems that in 7.9 we haven't completely support the source without @timestamp, not sure if that's your case.

1 Like

Would you like to try updating the lookback time to 24 hours and see how it goes?

Hi Angela,

I have tried changing lookback times and even tried multiple other fields. None of that seemed to work, also yes I do have an @timestamp field in my index.

However, I tried leaving the aggregation field empty, or rather I left the default setting 'All results' in and that seemed to generate some alerts. However the signal was not populated with any kind of information, except the fields present in the 'Custom query' string.

I will upgrade to 7.9.2 and see if that changes anything. I doubt it tho, I think there is some issue with how I set up my index.

Hey @madduck,

I have tried changing lookback times and even tried multiple other fields. None of that seemed to work, also yes I do have an @timestamp field in my index.

Can you send us your mapping to your index either here or through a DM and a complete record sample from the index (even if you have to hand change some data to protect the innocent)? What's interesting looking at the sample record screenshot is that this field:

event.timestamp

Has this date timestamp:

05 Oct 2020 17:52:29.576736

Which looks like a custom date time field which makes me want to see your @timestamp mapping and and your other mappings to see if I can replicate what you're seeing. We have seen in the past issues when there is not a time zone associated with @timestamp

The best is always going to be strict_date_optional_time for timestamps and data that looks like the following:

2020-08-13T11:23:24.914Z

However the signal was not populated with any kind of information, except the fields present in the 'Custom query' string.

Yes, that is correct at the moment. This is the fun part where I'm about to ask you for feedback since you have been using it! :slight_smile:

This feature populates the fields from the query "in a synthetic manner" rather than look at the records being returned. I think there are some trade offs going on here for this feature with speed of aggregation vs analyzing potentially lots of records since the threshold value is "unbounded" with regards to how many counts you are looking for before it fires.

However, I I think most people are using thresholds in the low < 100 area (What is your largest value threshold btw so we know?).

Now when you want the other fields populated this sounds like "your expectations", no? "Why are all these other fields empty!?" kind of question.

Well, our first hurdle is that the other fields no included in the query could be either the same value for our N number of records or they could be completely different for each record they are aggregating against. Doesn't mean we aren't going to be clever about how to represent or make all those different fields searchable in a signal or a special type, we just haven't done that part at this moment.

2 Likes

Hey Frank,

thank you for reaching out!

This is my index mapping:

And here is an indexed failed login entry:

Due to the 14k character limitation I had to upload them to pastebin. I will however also send you both of these things as a PM for completions sake.

If you need more documents just tell me, but almost all documents in that index are successful and failed logon events. The only difference in them is that the category.outcome changes from "failed" to "successful"

My largest threshold at the moment is just 5, could go into the hundreds tho depending on the use case I'm implementing.

Not really an expectation, more an assumption.

I assumed since you are unable to change the fields the signal table displays that it will automatically try to look for the default values like "user.name", "host.name", "source.ip", etc on its own in the events that have caused the signal to rise.

But what you said about different records having different entries makes sense.
Keep in mind that I only tried this to see if a signal can get generated at all and what the generated signal might look like.

//Edit: PM/DMs also have a character limit. I am unable to send you my mappings through the forums, please use the linked pastebins.

Hey @madduck,

Thanks for posting the mapping and sample data. I think I found a mapping issue. I loaded your mapping into a "delete me" type temp index and when I do this in dev tools:

GET ldap-delme/_mapping/field/user.name
{
  "ldap-delme" : {
    "mappings" : {
      "user.name" : {
        "full_name" : "user.name",
        "mapping" : {
          "name" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          }
        }
      }
    }
  }
}

I can see that the name is a text field and not a keyword compared to say something like auditbeat which has it flipped the other way:

{
  "auditbeat-7.8.0-2020.08.13-000003" : {
    "mappings" : {
      "user.name" : {
        "full_name" : "user.name",
        "mapping" : {
          "name" : {
            "type" : "keyword",
            "ignore_above" : 1024,
            "fields" : {
              "text" : {
                "type" : "text",
                "norms" : false
              }
            }
          }
        }
      }
    }
  }
}

So when I try to use threshold with user.name which is going to try to do an aggs I get this error below. I am surprised if you don't see the errors about it on your UI and failure history but recently we have added a lot more bubbling of error messages from the backend for this upcoming release that I'm utilizing so that might be why if you're not seeing errors:

My suggestion would be change your mapping and then do a reindex and give thresholds another try to see if that fixes things for you.

Hi Frank,

correct me if I'm wrong but doesn't my index have user.name.keyword as a multi field to user.name?

when I do GET ldap-2020.10.07/_mapping/field/user.name.keyword
I get this output:

{
  "ldap-2020.10.07" : {
    "mappings" : {
      "user.name.keyword" : {
        "full_name" : "user.name.keyword",
        "mapping" : {
          "keyword" : {
            "type" : "keyword",
            "ignore_above" : 256
          }
        }
      }
    }
  }
}

I also get the same amount of hits when I do these two queries in dev tools:

regular field:

GET ldap-2020.10.07/_search
{
  "query": {
    "match": {
      "user.name": "my.name"
    }
  }
}

keyword:

GET ldap-2020.10.07/_search
{
  "query": {
    "match": {
      "user.name.keyword": "my.name"
    }
  }
}

Both queries produce 4 hits and return the same documents.
I was initially working with the .keyword field however the screenshots were produced when I tried using the non-keyword field instead.

Its me again.

I have created a new index called ldap_delme2 with the following command:

POST ldap_delme2/_mapping
{
  "properties": {
    "@timestamp" : {
      "type" : "date"
    },
    "user.name" : {
      "type": "keyword"
    },
    "event.category": {
      "type": "text"
    },
    "event.outcome" : {
      "type": "text"
    }
  }
}

which leads to this mapping schema:

{
  "ldap_delme2" : {
    "mappings" : {
      "properties" : {
        "@timestamp" : {
          "type" : "date"
        },
        "event" : {
          "properties" : {
            "category" : {
              "type" : "text"
            },
            "outcome" : {
              "type" : "text"
            }
          }
        },
        "user" : {
          "properties" : {
            "name" : {
              "type" : "keyword"
            }
          }
        }
      }
    }
  }
}

I have indexed this event 5 times (timestamps differ by 1 second for each entry):

POST ldap_delme2/_doc/
{
  "@timestamp" : "2020-10-08T17:00:15",
  "user.name" : "test.user",
  "event.category" : "authentication",
  "event.outcome" : "failed"
}

Which leads to an index that looks like this:

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 6,
      "relation" : "eq"
    },
    "max_score" : 0.074107975,
    "hits" : [
      {
        "_index" : "ldap_delme2",
        "_type" : "_doc",
        "_id" : "Yf7OCHUBJXM-BF0goJ6i",
        "_score" : 0.074107975,
        "_source" : {
          "@timestamp" : "2020-10-08T17:00:10",
          "user.name" : "test.user",
          "event.category" : "authentication",
          "event.outcome" : "failed"
        }
      },
      {
        "_index" : "ldap_delme2",
        "_type" : "_doc",
        "_id" : "Yv7OCHUBJXM-BF0gp55A",
        "_score" : 0.074107975,
        "_source" : {
          "@timestamp" : "2020-10-08T17:00:11",
          "user.name" : "test.user",
          "event.category" : "authentication",
          "event.outcome" : "failed"
        }
      },
      {
        "_index" : "ldap_delme2",
        "_type" : "_doc",
        "_id" : "Y_7OCHUBJXM-BF0grZ40",
        "_score" : 0.074107975,
        "_source" : {
          "@timestamp" : "2020-10-08T17:00:12",
          "user.name" : "test.user",
          "event.category" : "authentication",
          "event.outcome" : "failed"
        }
      },
      {
        "_index" : "ldap_delme2",
        "_type" : "_doc",
        "_id" : "ZP7OCHUBJXM-BF0gs545",
        "_score" : 0.074107975,
        "_source" : {
          "@timestamp" : "2020-10-08T17:00:13",
          "user.name" : "test.user",
          "event.category" : "authentication",
          "event.outcome" : "failed"
        }
      },
      {
        "_index" : "ldap_delme2",
        "_type" : "_doc",
        "_id" : "Zf7OCHUBJXM-BF0guJ7n",
        "_score" : 0.074107975,
        "_source" : {
          "@timestamp" : "2020-10-08T17:00:14",
          "user.name" : "test.user",
          "event.category" : "authentication",
          "event.outcome" : "failed"
        }
      },
      {
        "_index" : "ldap_delme2",
        "_type" : "_doc",
        "_id" : "Zv7OCHUBJXM-BF0gvp59",
        "_score" : 0.074107975,
        "_source" : {
          "@timestamp" : "2020-10-08T17:00:15",
          "user.name" : "test.user",
          "event.category" : "authentication",
          "event.outcome" : "failed"
        }
      }
    ]
  }
}

I created a new detection rule with these settings:

The rule is unable to produce any signal.
Now in the "Overview" Tab I get a Data Fetch error that complains about the usage of non-keyword fields for aggregation even tho the user.name field is only mapped to the type keyword.

I have created a third index identical to the one I posted here except every field has the type keyword.

I am still unable to produce a single signal.

Well the good news is we just logged an issue where we are no longer going to allow users to use thresholds on "non-aggregatable" fields so we can give a better UI/UX experience:

So, thank you for the forum posts and looking at things.

In the meantime, before that bug fix goes across I am going to explain a bit about keyword/text fields, aggregatables and mapping conflicts for you and anyone else currently running into this so you know how/why this is the way it is.

If you look at how the threshold rule works, it is using an aggregation with a min document count just like this:

Code for the lines in question:

So when doing dev tooling queries you want to be careful about aggregations vs regular queries as aggregations are the queries that are picky about things being of particular types such as "keyword" or as it's referred to "aggregatable types"

Examples below:

ldap one I have which will blow up in dev tools because user.name is a text field

GET ldap-delme/_search
{
  "size": 0,
  "aggregations": {
    "threshold": {
      "terms": {
        "field": "user.name",
        "min_doc_count": 1
      }
    }
  }
}

Errors you will get back from ES:

"Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [user.name] in order to load field data by uninverting the inverted index. Note that this can use significant memory."

auditbeat which has the mapping with "text", "keyword" reversed where user.name is a keyword first, text as nested underneath will work like so because keyword is aggregatable:

GET auditbeat-7.9.1/_search
{
  "size": 0,
  "aggregations": {
    "threshold": {
      "terms": {
        "field": "user.name",
        "min_doc_count": 1
      }
    }
  }
}

If I change out my first ldap index query to use the explicit user.name.keyword then this will now also now work because I'm using the keyword field explicitly:

GET ldap-delme/_search
{
  "size": 0,
  "aggregations": {
    "threshold": {
      "terms": {
        "field": "user.name.keyword",
        "min_doc_count": 1
      }
    }
  }
}

In the ECS docs however, ECS wants the user.name to be keyword first and then anything additional would be .text underneath it:


Which is why I recommend re-indexing and going along with ECS as that would make out of the box rules work and easier to collaborate with other people with rules and content. Also, if you mix together your ldap index mapping and an auditbeat mapping or other ECS mappings you will start to get mapping conflicts showing up and other bad behaviors because you cannot mix together indexes that have different data types. This can lead to more aggregation woa's.

Now, why are you hitting an error with ldap_delme2 though? Well if we look at event.outcome it is of type text which is not an aggregatable.

Using Kibana you can make a kibana index out of ldap_delme2 and it will show you what is and is not aggregatable:

First step is to create a kibana index from Stack Management Index Patterns ->

Then afterwards you can see what is and is not aggegatable and event outcome is not one of them:

What if I tried to mix auditbeat together with my ldap_delme which is your current mapping? Well, we would see Kibana index telling us we now have conflicts around user.name since one is using a text field and the other is keyword.

I try to mix a valid ECS with yours which has a few fields with text where we would expect keyword

Afterwards I see I have a conflict in three areas by selecting data type "conflict":

And I can hover to see what is going on:

2 Likes

Whelp talk about anti-climatic. This was the part that caused all the issues..

Also just to not leave any open ends:
The detections did trigger on the indics from this post I just had to wait a couple of hours.