ES Continuous Transforms Checkpoint not updating

Hi,
this is pertaining to ES 7.10.2 with X-Pack.
I am trying out transforms because we have a need to have a aggregate table of entities to be queried by downstream applications.
we have a set of indices that the transform will read, and then for the aggregate index, update the first/last timestamp seen for a set of group-by criteria.
I did the transform setup in Kibana, and set the transformation to be continuous. I observed the progress of the job via Kibana transform console. At the start of the job, I could see the index being updated, and the 1st checkpoint was created. As new data came into the source indexes, I could see that the values in the "preview" was updating. the problem is that the index has stopped writing (even for new entities), and there were no new checkpoints created. Frequency is set at a very low value (1min).
The setting that I'm not very certain about after reading the documentation is the date field and delay, but I swapped around several different datetime fields in my data to no avail.

I would like the index to update, but it is only showing updated values in the preview. What are the possible areas that I can check? Thank you!

My use case is that I have some indexes with a fixed mapping. there are 2 time fields (reported time, vs uploaded time. Uploaded time is slightly behind real-time because uploaded time is from an upstream ETL). I want to create a table that shows, for each combination of fieldA-fieldB, show the first/last reported time.
I really think ES transform fits my use case, but it is not working at the moment and I ran out of ideas to troubleshoot.

p.s: I can't show any screenshot since the cluster is in intranet.

GET _transform/<transform-job>/_stats: output
I see a checkpointing portion in the output, there is a "last" and "operations_behind" but there is no "next".
There has been no writes to the destination index since several hours ago, and the source index are still being incremented with new entries.

For running transform in continuous mode the sync section is mandatory. It's basically used to find out what has changed. The idea is to only do the necessary updates but not a full re-run.

sync.time.field must be a date field that is using a real timestamp. A checkpoint is basically a timestamp until all data has been processed and the next time transform triggers it looks for new data that has arrived after the checkpoint. For the same reason transform will not be able to see new documents with a historic timestamp. If you don't know what to take as field, you can use a ingest pipeline and add an ingest timestamp as described here.

sync.time.delay describes how much time transform should reserve for ingest processing and data coming in out of order. For example: Assume the timestamps are created on some IoT device and only send once every minute as batch. Network takes some millisecondes, next you have a queue, than ingest, finally it enters elasticsearch. Note the index refresh interval which adds another second. Delay is basically the sum of the steps I described, with other words, when transform sees the new datapoint, it can be worst case sum(steps) ms old. That's your ingest delay. There is a trick: If you use the suggested ingest timestamp, you skip a lot of the steps above and your worst case assumption becomes a lot easier.

I hope that helps.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.