In a continuous transform, when a new document is added to the source index and the transform has to update one of its own document, it seems that the aggregations calculations are restarted from scratch. It induces that if some documents previously selected by the transform are no longer present in the source index, they are no longer taken into account for the calculations of the aggregations and their information is lost.
Is there a way to tell the transform to only update the aggregation fields based on the new retrieved documents by keeping the previously calculated information intact ?
For instance, in a value_count aggregation, the previously calculated count would be incremented with the only consideration of the new documents.
thank you for your feedback. Updating is currently not possible but this is on our list for the future. However the aggregation is not restarted from scratch, but only the changed entities are updated. Still, transform requires that the source does not get deleted.
If you look for compaction, rollup might be the better tool for you.
To give you a bit more background: Updating is simple for count, min, max, a bit more complicated for avg, very complex for e.g. cardinality, percentile. For anything that requires scripts we would need a user supplied merge/combine method. With other words, this is harder than it seems.
Still, there are usecases like yours where updating would be beneficial. Another usecase might be performance related. For large amounts of data, an update should be more performant than the rewrite at the moment.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.