Starting a continuous Transform

mykael · September 16, 2020, 1:55pm

I'm setting up a new ELK instance and want to have some continuous transforms running on it.

So I load up may table definitions, my transform definitions, bootstrap the transfrom datasets and tell ELK to start them running.

Then I get this message:

{"type": "server", "timestamp": "2020-09-16T17:31:32,042+08:00", "level": "INFO", "component": "o.e.x.t.t.TransformIndexer", "cluster.name": "Mik_G3_Demo", "node.name": "node-1", "message": "[miks_test_transform] unexpected null aggregations in search response. Source indices have been deleted or closed.", "cluster.uuid": "ORpCsKyuTmC5v1TH6p0mag", "node.id": "JDySMfZWQlKLW9ocoe45ng" }

...and then the transform is stopped ;-(

Problem is, it's getting no data because the data for the indexes hasn't been loaded yet. I don't want to turn the data on until the transform is running because if I don't it might might miss some of the early data (the sync is set to 60s). I really don't want this to be a race situation...

Any suggestions?

Emanuil · September 17, 2020, 1:57am

Wait a second, the first run of the transform will pick up all existing data. It tracks a checkpoint from then on, and only looks at newer docs since its last checkpoint. If it doesn't have a checkpoint because it was just created, it'll transform everything its query returns. In your case it'll make a new checkpoint every 60s, that's what that setting is for.

If it behaved like you're describing, then you could never transform existing indices, you'd always have to start the transform before loading any data. But that's not the case, I've used it on preexisting indices :).

warkolm · September 17, 2020, 2:04am

Did you check to see if that is the case?

Hendrik_Muhs · September 17, 2020, 6:20am

Just to add to @Emanuil's answer:

Transform reads all historic data after start, only after that bootstrap checkpoint - checkpoint 1 - it goes into continuous mode and only updates the changes. If you have a lot of historic data and don't want transform to read all the old data, you can add a range query with a lower boundary in source.

If you want to create a transform before data ingest you must:

use a index pattern instead of concrete indices e.g. mydata-*
turn off validation on PUT with defer_validation=true

If you want to start a transform before data ingest you must:

create the destination index yourself, because transform requires at least 1 source index to get the mappings right

If you do the above the transform can start and the task will check according to frequency if data has arrived and start transforming data once it's there. Still, checkpoint progresses, you can't start a transform and afterwards push historic data.

system · October 15, 2020, 6:20am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
ES Continuous Transforms Checkpoint not updating Elasticsearch transforms	2	1592	March 30, 2021
Continuous tranforms stop indexing after initial indexing Elasticsearch	4	439	July 31, 2020
Transform not synching Elasticsearch transforms	9	1747	January 19, 2022
Understanding continous transforms syncing Elasticsearch transforms	5	4015	November 19, 2020
Transforms on data older than a specific timestamp: checkpoints are not created Elasticsearch transforms	3	789	March 8, 2022

Starting a continuous Transform

Related topics