Transforms on data older than a specific timestamp: checkpoints are not created

paulov · February 7, 2022, 2:26pm

Continuing the discussion from ES Continuous Transforms Checkpoint not updating:

Hi,
I'm creating a transform job that transforms data in an index and takes all documents OLDER THAN 6 days (calculating averages etc).
When I run the job I notice that only one checkpoint is created and no more.
I have configured continuous mode, a @timestamp field and a 60sec delay value.

However, each 60 seconds a query for data older than 6 days will result in new data (with timestamp of 6days-60seconds ago to 6days ago) so I expect a new checkpoint for this data to be created.
This does not happen. Why? I see in the transform statistics

      "checkpointing" : {
        "last" : {
          "checkpoint" : 1,
          "timestamp_millis" : 1644238548355,
          "time_upper_bound_millis" : 1644238440000
        },
        "operations_behind" : 12850,
        "changes_last_detected_at" : 1644238548355
      }

that 12850 documents more are processed but are apparently not checkpointed.

Is there an explanation for this? Is transform only designed for examining documents NEWER than a specific timestamp?

Hendrik_Muhs · February 8, 2022, 7:39am

Continuous mode is configured on a date field. The timestamp of that field is used for checkpointing. In your example time_upper_bound_millis = 1644238440000 - which translates to 02/08/2022 @ 7:24am in UTC is the time upper bound of the checkpoint. If you push data before that time transform is not able to query for this data. That's by design.

Note the difference between timestamp_millis and time_upper_bound_millis. time_upper_bound_millis is calculating taking the system time, deducting delay. In this case the value is additionally rounded down to a bucket boundary, because you use a date_histogram in your transform configuration.

To workaround your problem you have 2 options:

increase delay: if you increase delay to e.g. 6d transform won't miss any data that arrives now - 6d, however it also won't process any data in between.
use an ingest timestamp for sync: If you add another timestamp field in your source data which contains the date when the data has been ingested/indexed in Elasticsearch you can configure sync to use the ingest timestamp while you can still pivot on another timestamp field.

paulov · February 8, 2022, 10:29am

Thx, I overlooked that delay setting. This works perfectly, and we don't need to process any data in between, only 'older data'.

system · March 8, 2022, 10:29am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
ES Continuous Transforms Checkpoint not updating Elasticsearch transforms	2	1623	March 30, 2021
Continuous transformation is not update Elasticsearch	4	157	January 11, 2024
Continuous transform doesn't use checkpoint timestamp to filter search Elasticsearch transforms	4	539	October 19, 2020
Continuous transform is "indexing", but no new destination records Elasticsearch	4	396	September 11, 2020
Transform not aligning checkpoints with date histogram Elasticsearch transforms	7	744	February 24, 2023

Transforms on data older than a specific timestamp: checkpoints are not created

Related topics