I'm thinking about using elastic transforms together with ingest pipelines to basically create views of mongodb collections that are spread over multiple database servers, so we can efficiently sort and filter by references over those databases.
My main question is: If we have a transform pipeline, with an ingest pipeline using an enrich processor on the destination index, is it possible to do a update-by-query call on the desination index once a reference changed so the enrich processor runs again on matching documents?
yes, exactly. What is unclear to me is: What data does the update-by-query operation use when updating documents?
Because if the index is created by a transform, then how does the update-by-query operation know how to load the source of that document that the transform used to generate the doc?
Or is it just using the exinsting document in the destination index of the transform ?
You would need to manually run the update_by_query request on the destination index, and you can set an ingest pipeline to run with this request as well.
You would need to manually run the update_by_query request on the destination index, and you can set an ingest pipeline to run with this request as well.
that's what i assumed yes. But what i still don't understand is
The destination index is created by a transform, that is fed by some aggregations.
When i manually run an update-by-query request on the destination index, what does the update operation actually do - which data does it use to update the matching documents?
Since i don't supply any actual values in the operation that should be set on the matching documents, is it just running the ingest pipelines on the existing document? (which is what i assume it does)
It uses the _source field of each document, what will be updated will depend if you will use a update query or just run an ingest pipeline with the enrich processor.
From the documentation I would assume that it would just rerun the ingest pipeline on the current data, if this will update anything will depend on the ingest pipeline and how your data looks like.
But I'm also not sure if this will impact the transform or not.
You would need to duplicate your transform and destination index to be able to test this safely.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.