Starting a continuous Transform

I'm setting up a new ELK instance and want to have some continuous transforms running on it.

So I load up may table definitions, my transform definitions, bootstrap the transfrom datasets and tell ELK to start them running.

Then I get this message:

{"type": "server", "timestamp": "2020-09-16T17:31:32,042+08:00", "level": "INFO", "component": "o.e.x.t.t.TransformIndexer", "cluster.name": "Mik_G3_Demo", "node.name": "node-1", "message": "[miks_test_transform] unexpected null aggregations in search response. Source indices have been deleted or closed.", "cluster.uuid": "ORpCsKyuTmC5v1TH6p0mag", "node.id": "JDySMfZWQlKLW9ocoe45ng" }

...and then the transform is stopped ;-(

Problem is, it's getting no data because the data for the indexes hasn't been loaded yet. I don't want to turn the data on until the transform is running because if I don't it might might miss some of the early data (the sync is set to 60s). I really don't want this to be a race situation...

Any suggestions?

Wait a second, the first run of the transform will pick up all existing data. It tracks a checkpoint from then on, and only looks at newer docs since its last checkpoint. If it doesn't have a checkpoint because it was just created, it'll transform everything its query returns. In your case it'll make a new checkpoint every 60s, that's what that setting is for.

If it behaved like you're describing, then you could never transform existing indices, you'd always have to start the transform before loading any data. But that's not the case, I've used it on preexisting indices :).

Did you check to see if that is the case?

Just to add to @Emanuil's answer:

Transform reads all historic data after start, only after that bootstrap checkpoint - checkpoint 1 - it goes into continuous mode and only updates the changes. If you have a lot of historic data and don't want transform to read all the old data, you can add a range query with a lower boundary in source.

If you want to create a transform before data ingest you must:

  • use a index pattern instead of concrete indices e.g. mydata-*
  • turn off validation on PUT with defer_validation=true

If you want to start a transform before data ingest you must:

  • create the destination index yourself, because transform requires at least 1 source index to get the mappings right

If you do the above the transform can start and the task will check according to frequency if data has arrived and start transforming data once it's there. Still, checkpoint progresses, you can't start a transform and afterwards push historic data.

1 Like