How is an 'avg' aggregation updated in transforms when old records are deleted?

duttad · April 7, 2020, 9:10pm

I am creating a transformed index from a source index. (https://www.elastic.co/guide/en/elasticsearch/reference/master/transform-overview.html)
Logs older than 5 days are deleted from the source index. When computing an 'avg' metric aggregation, will the deleted logs be un-considered? Below is the code skeleton for reference.

POST _transform/_preview
{
  "source": {
    "index": "src_index",   
  },
  "dest": {
    "index": "transform_test"
  },
  "pivot": {
    "group_by": {
      "pivot_name": { "terms": {
        "field": "pivot_term"
      }}     
    },
    "aggregations": {
      "avg_val": { "avg": {
        "field": "field_name"
      }}      
    }
  }
}

Hendrik_Muhs · April 8, 2020, 6:00am

The transform you posted is a so called batch transform, it will only run once and calculate the average based on the available data.

However, I assume you plan to turn this into a continuous transform. For a continuous transform the average is calculated at the time of data retrieval. If you delete data the bucket/document in the destination index keeps its value. In case the bucket is re-calculated, the average is recalculated as well and therefore changes due to the deleted data, too.

With other words: If your pivot_term is a unique id, this isn't a problem as the bucket would not be recalculated. If not, all aggregations are recalculated on the available data. In usecases like yours, it is useful to add min and max fields to know when the bucket has been recalculated last and its earliest data point. That way you can also filter out old buckets when you search on the destination index.

Your usecase looks like a data compaction one, rollup might be more suitable.

I hope this helps!

(We are looking into further transform improvements, so its very useful to us to hear about usecases like this.)

duttad · April 13, 2020, 9:23pm

That answers my question very clearly. I was indeed planning on doing a continuous transform, though it was not apparent from my question or code.

While data compaction is what I am trying to achieve, Rollup, by itself, will probably create multiple records for a unique pivot_term (each one for a bucket). I would want a single record for a unique pivot_term. I believe I can achieve that with some post-processing on the Rollup.

system · May 11, 2020, 9:23pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Avoid recalculation from scratch of Transform aggregation Elasticsearch	3	504	March 13, 2020
Can a transform recalculate if old documents update their values? Elasticsearch transforms	4	1285	June 11, 2021
Questions Regarding Deleting Source Data After Transform Elasticsearch transforms	6	1328	September 3, 2021
Transform behavior with deleted documents Elasticsearch	5	792	August 5, 2020
Can i use Transforms to collapse data? Elasticsearch	11	934	November 20, 2019

How is an 'avg' aggregation updated in transforms when old records are deleted?

Related topics