Hi,
I'm creating a transform job that transforms data in an index and takes all documents OLDER THAN 6 days (calculating averages etc).
When I run the job I notice that only one checkpoint is created and no more.
I have configured continuous mode, a @timestamp field and a 60sec delay value.
However, each 60 seconds a query for data older than 6 days will result in new data (with timestamp of 6days-60seconds ago to 6days ago) so I expect a new checkpoint for this data to be created.
This does not happen. Why? I see in the transform statistics
Continuous mode is configured on a date field. The timestamp of that field is used for checkpointing. In your example time_upper_bound_millis = 1644238440000 - which translates to 02/08/2022 @ 7:24am in UTC is the time upper bound of the checkpoint. If you push data before that time transform is not able to query for this data. That's by design.
Note the difference between timestamp_millis and time_upper_bound_millis. time_upper_bound_millis is calculating taking the system time, deducting delay. In this case the value is additionally rounded down to a bucket boundary, because you use a date_histogram in your transform configuration.
To workaround your problem you have 2 options:
increase delay: if you increase delay to e.g. 6d transform won't miss any data that arrives now - 6d, however it also won't process any data in between.
use an ingest timestamp for sync: If you add another timestamp field in your source data which contains the date when the data has been ingested/indexed in Elasticsearch you can configure sync to use the ingest timestamp while you can still pivot on another timestamp field.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.