Is there any way to run Transform periodically (e.g., at each 5 seconds) regardless of having an update in the sync date field of source index?
I'm trying to use Transform to aggregate values in a sliding window over the time (e.g., now-24h), but new values in source index only arrive when a new event occurs. So, my sliding window should not consider the first old values as the time goes by.
I already tried to use only the "frequency" param of Transform (without the "sync" param), but it was created in batch mode, not continuous. If I use "sync" param, transform remains waiting for a doc datetime update to execute.
The described usecase is not possible at the moment.
If I understand correctly you would like to run the full transform every 5 seconds?
A continuous transform is optimized to only update changed entities, if I understand correctly you are looking for updating the full dataset.
I wonder about your usecase. Are you running further analysis on this data? Otherwise you could simply run the aggregation at query time. Why do you not need a transform?
Yes, I'm trying to update the full dataset. It is because my dataset is composed of rainfall data coming from rain gauge equipment (thousands of them). It is important to users (also mathematical models) to know the precipitation accumulation for the past 1, 3, 6, 12, 24 hours to 5 days.
We thought to perform aggregation at query time, but as the same data will be consumed many times for different users, we start to investigate some efficient method to do that. The other option is to execute an aggregation query in a pipeline or external process (e.g., Java or Phyton) and output results into an index.
Do you have any suggestion? Do you think Transform would incorporate this functionality in a future version?
Yes, sounds to be a similar use case, except that we are not using ML. I saw it was tagged as enhancement. Do you think it would be a new feature in future releases?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.