Hi @siginigin,
I shared your reply with the team, and they offered the following:
As an alternative to creating an off_hours
field by enriching the data at ingest time (via an ingest pipeline, or in your case, Logstash), you may consider experimenting with defining off_hours
as a Runtime field.
The following description of runtime fields is from the Runtime fields: Schema on read for Elastic blog post:
Runtime fields enable you to create and query fields that are evaluated only at query time. Instead of indexing all the fields in your data as it’s ingested, you can pick and choose which fields are indexed and which ones are calculated only at runtime as you execute your queries. Runtime fields support new use cases and you won’t have to reindex any of your data.
Exploring a solution based on runtime fields
Note: Runtime fields are beta functionality and subject to change. The details below were shared by a colleague who briefly experimented with runtime fields based on your use case. The information below does not describe a complete solution.
The following POST
creates a runtime field a runtime field named hour_of_day
, which extracts just the hour portion from the @timestamp
field:
POST test-index/_mapping
{
"runtime": {
"hour_of_day": {
"type": "long",
"script": {
"source": """
emit(doc['@timestamp'].value.hourOfDay)
"""
}
}
}
}
The goal of creating a runtime field like hour_of_day
in the example above, would be to refer to hour_of_day
in a detection rule's KQL query, e.g. hour_of_day >= 22 OR hour_of_day <=5
However, a caveat to the script above is when a date
field is accessed via the doc
collection:
emit(doc['@timestamp'].value.hourOfDay)
the timezone information is lost, and the hourOfDay
is always reported in UTC. This happens because doc
is using the indexed field, which under-the-hood is really just a number representing milliseconds since the epoch.
My colleague found that when fields are accessed via params._source
instead of doc
, per the example below, it's possible to get the original string value from the event, and extract the timezone information from it:
"hour_of_day": {
"type": "long",
"script": {
"source": """
ZonedDateTime zdt = ZonedDateTime.parse(params._source['mytimestamp']);
emit(zdt.getHour());
"""
}
},
The example above doesn't implement error handling for the case where timestamps can't be parsed. Without error handling, a single document containing a non-parsable timestamps could cause the entire query to fail.
Thus, neither of the examples above are a complete implementation of hour_of_day
as a runtime field, but they can serve as a starting point for experimentation.
Another colleague noted that even if hour_of_day
is implemented as a runtime field that reliably works across time zones so it can be referred to in KQL, the definition of "normal working hours" can still vary widely, even for users in the same time zone. It may be possible to pivot to a machine-learning-driven approach to this problem, where ML determines what "off hours" is based on the behavior on an individual user over time. The folks here are best-positioned to answer questions about an ML-driven approach, if that's an alternative you're willing to explore at this time.