Avoid recalculation from scratch of Transform aggregation

Hi,

In a continuous transform, when a new document is added to the source index and the transform has to update one of its own document, it seems that the aggregations calculations are restarted from scratch. It induces that if some documents previously selected by the transform are no longer present in the source index, they are no longer taken into account for the calculations of the aggregations and their information is lost.

Is there a way to tell the transform to only update the aggregation fields based on the new retrieved documents by keeping the previously calculated information intact ?

For instance, in a value_count aggregation, the previously calculated count would be incremented with the only consideration of the new documents.

Hi,

thank you for your feedback. Updating is currently not possible but this is on our list for the future. However the aggregation is not restarted from scratch, but only the changed entities are updated. Still, transform requires that the source does not get deleted.

If you look for compaction, rollup might be the better tool for you.

To give you a bit more background: Updating is simple for count, min, max, a bit more complicated for avg, very complex for e.g. cardinality, percentile. For anything that requires scripts we would need a user supplied merge/combine method. With other words, this is harder than it seems.

Still, there are usecases like yours where updating would be beneficial. Another usecase might be performance related. For large amounts of data, an update should be more performant than the rewrite at the moment.

Hi,

Thank you for your very clear answer. Indeed, I see now the complexity of the matter for some aggregations.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.