Transform with fixed time interval and different sync delay

Avarjana · June 24, 2022, 4:26am

I have a transform to identify anomaly and trigger alert in one of my index. Following are the details of the behavior.

Expected behavior

For each minute, aggregate last 5 minutes data and check for alerts.
Send records to relevant indexes.

Problematic current behavior

For each 5 minutes, aggregate the last 5 minutes data and check for alerts.
Send alerts to relevant indexes.

Cause of the problem
Transform rule has fixed time interval of 5 minutes for group_by and sync delay is 60s (1 min).

"pivot": {       
       "group_by": {
           …
           "@timestamp": {
               "date_histogram": {
                   "field": "@timestamp",
                   "fixed_interval": "5m"
               }
           }
       },
      …

This will capture events for

XX:00 - XX:05
XX:05 - XX:10
XX:10 - XX:15
....

This interval can be changed to 1 minute but it will check alerts in last one minute record per minute.

Issue
Alerts get only triggered per five minutes even if the event happens at the first minute of the interval.

It would be highly appreciated if you can provide a suggestion for this problem.

Hendrik_Muhs · June 24, 2022, 7:10am

Per default transform creates buckets after the bucket is complete, that's it waits 5 minutes, if your date_histogram is configured with 5 minutes interval. This improves performance, because transform does not need to update documents. You can change this using the setting align_checkpoints. It is default true and can be set to false. This will tell transform to process incomplete buckets for the price of more updates and therefore some performance penalty. You find this setting in the docs

Note that you will still have a waiting time of at least 1 minute if your sync delay is set to 60s, because transform will only query for data that is at least 1 minute old. This setting compensates ingest delays and data coming in in different order. If you know that your configured timestamp is guaranteed to reach elasticsearch earlier, you can decrease this setting to further optimize the time to trigger the alert. An even better approach which will compensate any problem on the data ingestion is the use of an ingest timestamp as explained here. By using an ingest timestamp you can decrease the setting for sync to e.g. 5s (You can't decrease it to 0s, because the refresh interval of a lucene index per default is 1s, so I think 2s should be the minimum).

Avarjana · June 27, 2022, 11:37am

Thank you for this great explanation. Highly appreciate the support.

system · July 25, 2022, 11:38am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Transform not aligning checkpoints with date histogram Elasticsearch transforms	7	743	February 24, 2023
Transform missing data Elasticsearch transforms	3	1169	August 11, 2022
Is there a soon update for transforms API Elasticsearch transforms	6	511	November 4, 2021
"sync" command in Transform API Elasticsearch transforms	7	671	May 25, 2023
Transform data mismatch with source index Elasticsearch transforms	3	533	August 12, 2022

Transform with fixed time interval and different sync delay

Related topics