Using transforms and ingest pipelines on data that changes over time

I'm thinking about using elastic transforms together with ingest pipelines to basically create views of mongodb collections that are spread over multiple database servers, so we can efficiently sort and filter by references over those databases.

My main question is: If we have a transform pipeline, with an ingest pipeline using an enrich processor on the destination index, is it possible to do a update-by-query call on the desination index once a reference changed so the enrich processor runs again on matching documents?

Hi @pulsy,

You can use an ingest pipeline with an update_by_query, as per this example. Is that what you're thinking of?

yes, exactly. What is unclear to me is: What data does the update-by-query operation use when updating documents?

Because if the index is created by a transform, then how does the update-by-query operation know how to load the source of that document that the transform used to generate the doc?

Or is it just using the exinsting document in the destination index of the transform ?

You would need to manually run the update_by_query request on the destination index, and you can set an ingest pipeline to run with this request as well.

You would need to manually run the update_by_query request on the destination index, and you can set an ingest pipeline to run with this request as well.

that's what i assumed yes. But what i still don't understand is

  • The destination index is created by a transform, that is fed by some aggregations.
  • When i manually run an update-by-query request on the destination index, what does the update operation actually do - which data does it use to update the matching documents?

Since i don't supply any actual values in the operation that should be set on the matching documents, is it just running the ingest pipelines on the existing document? (which is what i assume it does)

It uses the _source field of each document, what will be updated will depend if you will use a update query or just run an ingest pipeline with the enrich processor.

From the documentation I would assume that it would just rerun the ingest pipeline on the current data, if this will update anything will depend on the ingest pipeline and how your data looks like.

But I'm also not sure if this will impact the transform or not.

You would need to duplicate your transform and destination index to be able to test this safely.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.