How to force Transform to run periodically?

roishibr · April 3, 2020, 2:26pm

Hi there!

Is there any way to run Transform periodically (e.g., at each 5 seconds) regardless of having an update in the sync date field of source index?

I'm trying to use Transform to aggregate values in a sliding window over the time (e.g., now-24h), but new values in source index only arrive when a new event occurs. So, my sliding window should not consider the first old values as the time goes by.

I already tried to use only the "frequency" param of Transform (without the "sync" param), but it was created in batch mode, not continuous. If I use "sync" param, transform remains waiting for a doc datetime update to execute.

Transform script:

PUT _transform/my_transform
{
  "source": {
    "index": [
      "source_index"
    ],
    "query": {
      "bool": {
        "must": [
          {
            "range": {
              "datetime": {
                "gte": "now-24h"
              }
            }
          }
        ]
      }
    }
  },
  "dest": {
    "index": "dest_index"
  },
  "frequency": "5s",
  "pivot": {
    "group_by": {
      "code": {
        "terms": {
          "field": "code.keyword"
        }
      }
    },
    "aggregations": {
      "agg24h": {
        "sum": {
          "field": "value"
        }
      }
    }
  }
}

Any suggestion? Is there any other best way to do that?

Regards

Hendrik_Muhs · April 4, 2020, 5:11pm

The described usecase is not possible at the moment.

If I understand correctly you would like to run the full transform every 5 seconds?

A continuous transform is optimized to only update changed entities, if I understand correctly you are looking for updating the full dataset.

I wonder about your usecase. Are you running further analysis on this data? Otherwise you could simply run the aggregation at query time. Why do you not need a transform?

roishibr · April 6, 2020, 1:27pm

Hi Hendrik, thanks for your reply.

Yes, I'm trying to update the full dataset. It is because my dataset is composed of rainfall data coming from rain gauge equipment (thousands of them). It is important to users (also mathematical models) to know the precipitation accumulation for the past 1, 3, 6, 12, 24 hours to 5 days.

We thought to perform aggregation at query time, but as the same data will be consumed many times for different users, we start to investigate some efficient method to do that. The other option is to execute an aggregation query in a pipeline or external process (e.g., Java or Phyton) and output results into an index.

Do you have any suggestion? Do you think Transform would incorporate this functionality in a future version?

Hendrik_Muhs · April 6, 2020, 2:17pm

This sounds like https://github.com/elastic/elasticsearch/issues/53798

Does this cover your usecase?

roishibr · April 6, 2020, 4:59pm

Yes, sounds to be a similar use case, except that we are not using ML. I saw it was tagged as enhancement. Do you think it would be a new feature in future releases?

Hendrik_Muhs · April 6, 2020, 6:10pm

Yes, I think we will eventually add this. However, I can't say when.

roishibr · April 7, 2020, 8:14pm

Okay, thank you so much.

system · May 5, 2020, 8:14pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Transform API is not updating automaticly when i add some data in the source index Kibana transforms	2	375	February 10, 2022
Is there a soon update for transforms API Elasticsearch transforms	6	511	November 4, 2021
Impact of frequency value for continuous transform Elasticsearch transforms	3	177	July 11, 2024
Elastic Transforms - continous mode is not detecting changes Elasticsearch transforms	4	627	February 22, 2023
Transforms: do I need to filter source for time-series data? Elasticsearch transforms	10	1165	July 16, 2021

How to force Transform to run periodically?

Related topics