Elastic Transforms - continous mode is not detecting changes

I have a problem with my index transformation and already tried so many different versions but nothing helped:

My documents in the source index look like:

{
  "content" : {
    "creationTime" : "2022-07-25 16:00:49 +02:00",
    "orderId" : "552313",
    "state" : "SUCCESS",
    "category" : "catA"
  },
  "latestUpdate" : "2022-11-29 18:42:08 +01:00"
}
}

I want to use transforms to put the orders into time-buckets of 1minute (I originally wanted to use 30minutes, but I noticed that only 1 minute or 1 hour is supported but nothing in between, right?) and filter out orders with the state ERROR.
Every time, the data gets updated, the timestamp "latestUpdate" also changes to the current time.

My source index is updated every 10 minutes and overwrites some documents of the last week, since the status-field-entry changes.

My transform request looks like this:

PUT _transform/transform_my_index001
{
  "source" : {
    "index" : "my_index",
    "query": {
          "bool": {
            "must_not":[
              {"term":{"content.state":"ERROR"}}
            ]
          }
        }
    }
  },
  "pivot": {
    "group_by": {
      "creation_time": {
        "date_histogram": {
          "field": "content.creationTime",
          "calendar_interval": "1m",
          "time_zone": "Europe/Berlin"
        }
      },
      "portal": {
        "terms": {
          "field": "content.portal"
        }
      }
      "category": {
        "terms": {
          "field": "content.category"
        }
      }
    },
    "aggregations": {
      "order_count" : {
        "value_count": {
          "field": "content.state"
        }
      }
    }
  },
  "description": "Transform pipeline to put orders into a new index.",
  "dest": {
    "index": "aggregated_orders001"
  },
  "settings" : {
    "align_checkpoints" : false
  },
  "frequency": "5m",
  "sync": {
    "time": {
      "field": "latestUpdate",
      "delay": "60s"
    }
  }
}

The transform works for both old documents and new incoming documents but not for the updates. So if the state of one document changes to ERROR, the document is still being counted in the transformed index. I thought that this would be the perfect case for using the continous mode with my 'latestUpdate' field but apparently, I do something wrong... I found this "align_checkpoints" setting and hoped that this would help, but it still doesn't update the counts correctly. Do I maybe need to tell Elastic the time-format of the latestUpdate field, or the time zone?

Also: is there a better way to count the documents in the groups than choosing some field like content.state and do a value_count?

Any help would be very welcome and appreciated.

You are using calendar_interval, to use 30 minutes you have to use fixed_interval, more details in the docs.

That's because you filtered these documents in the query. The query is applied at the very beginning, making it impossible for transform to take ERROR into account.

In order to let transform see the errors, you have to apply the filter as part of aggregations:

"aggregations": {
      "order_count" : {
          "filter": {
            "bool": {
              "must_not":[
               {"term":{"content.state":"ERROR"}}
              ]
          }
      }
  }
}
1 Like

Thank you very much, that worked! :slightly_smiling_face:

Do I still need to set align_checkpoints to false?

align_checkpoints solves a different use case.

align_checkpoints controls when buckets are created. It is a trade-off between recency and resources:

You configured your transform with an interval of 1m. Assume it is 13:21 and 42s. With align_checkpoints:true transform will create all buckets including 13:20:00, with align_checkpoints:false transform will also create the bucket 13:21:00 although that bucket isn't complete, new data could still arrive. As a consequence transform might update this bucket several times, e.g. at 13:22:42 true would write this bucket for the 1st time, false would update the bucket. That means false is more expensive. However with align_checkpoints:true your transformed index lags behind.

With an interval of 1m this doesn't really matter that much, but think about longer intervals, e.g. 1h.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.