Coming from classic SQL and NoSql databases, I thought it must be possible to change documents in Elasticsearch datastream indexes. The first attempt via update_by_query (REST-Api) worked in the test but not in productive operation (relatively high change rate. > 10 changes / sec). The next attempt to control the requests via a MessageBroker and a queue with prefetch=1 (only one request at a time) worked better, but also failed with high transaction rates. The attempt to delete the existing document (delete_by_query) and to create a new one in the changed version worked better again, but not without problems.
It seems to me that it is impossible to reliably change documents in Elasticsearch Datastreams.
A data stream lets you store append-only time series data across multiple indices while giving you a single named resource for requests. Data streams are well-suited for logs, events, metrics, and other continuously generated data.
I would suggest you may want to rethink your approach. So what exactly are you trying to do, we may be able to offer alternatives if we better understand your use case.
So I thought it would be possible to have Documents with a status field which changes whenever the document gets another status ( e. g. new, checked-out, checked-in, available etc).
Sometimes it really helps to read the docs carefully
If I do the expected document updates on the backing indexes of the data-stream using the Index-API everything works like a charm.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.