Using transforms and ingest pipelines on data that changes over time

pulsy · November 14, 2023, 9:48am

I'm thinking about using elastic transforms together with ingest pipelines to basically create views of mongodb collections that are spread over multiple database servers, so we can efficiently sort and filter by references over those databases.

My main question is: If we have a transform pipeline, with an ingest pipeline using an enrich processor on the destination index, is it possible to do a update-by-query call on the desination index once a reference changed so the enrich processor runs again on matching documents?

carly.richmond · November 14, 2023, 11:18am

Hi @pulsy,

You can use an ingest pipeline with an update_by_query, as per this example. Is that what you're thinking of?

pulsy · November 14, 2023, 11:57am

yes, exactly. What is unclear to me is: What data does the update-by-query operation use when updating documents?

Because if the index is created by a transform, then how does the update-by-query operation know how to load the source of that document that the transform used to generate the doc?

Or is it just using the exinsting document in the destination index of the transform ?

leandrojmp · November 14, 2023, 1:05pm

You would need to manually run the update_by_query request on the destination index, and you can set an ingest pipeline to run with this request as well.

pulsy · November 14, 2023, 1:24pm

You would need to manually run the update_by_query request on the destination index, and you can set an ingest pipeline to run with this request as well.

that's what i assumed yes. But what i still don't understand is

The destination index is created by a transform, that is fed by some aggregations.
When i manually run an update-by-query request on the destination index, what does the update operation actually do - which data does it use to update the matching documents?

Since i don't supply any actual values in the operation that should be set on the matching documents, is it just running the ingest pipelines on the existing document? (which is what i assume it does)

leandrojmp · November 14, 2023, 1:55pm

It uses the _source field of each document, what will be updated will depend if you will use a update query or just run an ingest pipeline with the enrich processor.

From the documentation I would assume that it would just rerun the ingest pipeline on the current data, if this will update anything will depend on the ingest pipeline and how your data looks like.

But I'm also not sure if this will impact the transform or not.

You would need to duplicate your transform and destination index to be able to test this safely.

system · December 12, 2023, 1:55pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Continuous transform of a transform destination index Elasticsearch transforms	1	110	May 16, 2024
Updating enrich index for pipeline Elasticsearch	8	748	September 9, 2023
Updating the documents through the Ingest Pipeline Elasticsearch ingest-pipeline	1	156	March 18, 2024
How to update documents without using the ingest pipeline Elasticsearch	2	443	July 17, 2020
Does IndexRequest trigger ingest phase for a document already in the index? Elasticsearch ingest-pipeline	2	245	October 28, 2022

Using transforms and ingest pipelines on data that changes over time

Related topics