The transform you posted is a so called batch transform, it will only run once and calculate the average based on the available data.
However, I assume you plan to turn this into a continuous transform. For a continuous transform the average is calculated at the time of data retrieval. If you delete data the bucket/document in the destination index keeps its value. In case the bucket is re-calculated, the average is recalculated as well and therefore changes due to the deleted data, too.
With other words: If your pivot_term is a unique id, this isn't a problem as the bucket would not be recalculated. If not, all aggregations are recalculated on the available data. In usecases like yours, it is useful to add min and max fields to know when the bucket has been recalculated last and its earliest data point. That way you can also filter out old buckets when you search on the destination index.
Your usecase looks like a data compaction one, rollup might be more suitable.
I hope this helps!
(We are looking into further transform improvements, so its very useful to us to hear about usecases like this.)
That answers my question very clearly. I was indeed planning on doing a continuous transform, though it was not apparent from my question or code.
While data compaction is what I am trying to achieve, Rollup, by itself, will probably create multiple records for a unique pivot_term (each one for a bucket). I would want a single record for a unique pivot_term. I believe I can achieve that with some post-processing on the Rollup.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.