Can some one explain how the transform runs it in this scenario

anjilinga · June 26, 2020, 5:20pm

Hi,

I am using transform index, and i am grouping the records by day wise to find out max.

but the frequency of the transform can be set maximum as 1hour.

if the transform runs at next hour will it consider the records already processed to group by day.

Example.
Time field value
8:10 - 10
8:35 - 25
8:50- 20

transform runs at 9, and displayed max as 25

9:20 - 20
9:50- 10

transform runs at 10, will it consider the previous records and display as 25

10:25 - 30
10:55 - 20

ttransform runs at 11, will it consider the previous records and display as 30

if delay field is 60s, it is not processing records, what is the exact significance of this field in sync object ?

mykael · June 27, 2020, 2:20pm

The sync field is used with the high water mark and the delay to determine which records in the input index it is going to do the calculation for. Delay may need to be >= frequency to ensure it processes all records. It may be querying for sync > (now - delay) every frequency interval - sync > (now - 60s) every hour - in your case, which would miss 59 minutes worth of data.

What I think happens when it runs is that it groups all of the input records - the date histogram for the @timestamp works by whole calendar days. So it works out that all of the input records are on the same day. Then it does a query against the index for all records that occurred on that day, groups them and writes the resulting documents to the output index, creating new versions (or maybe maybe updating) those documents as it goes.

anjilinga · June 27, 2020, 3:09pm

Thanks Clarke,

if the frequency is 1 hr. when it runs transform at 10. it consider the records greater than 9:59.
it will loose 9 to 9:59 data ? is my assumption is correct ?
when it runs at 10 or 11 or 12, it should consider previous records from the source index. in this time will it consider the missing data ?
if sync field is not @timestamp, then what is the formaula for delay value( if the sync field is another time field and it is having 30 min difference to present time)
delay >= frequency + 30 min ?

Hendrik_Muhs · June 29, 2020, 8:15pm

frequency defines how often transform looks for new data and in case of a failure how quick it re-tries. This setting only defines scheduling, this setting has no impact on how the data is transformed. With other words, using different frequencies does not lead to different data.

sync and the sub-setting delay does not impact how data is transformed, assuming you set it up correctly: delay defines the ingest delay, it means: "When is it safe to query for data?". The time used for the timestamp field can have different delays, e.g. if you feed the timestamp from an external system, it might be, that you batch data and send it e.g. every 5 minutes. For this case delay must be 5 minutes plus whatever it takes to transfer the data over the network and index it in elasticsearch (refresh_interval).

A continuous transform works in 2 steps:

identify the data points that need to be updated
re-create the data points that needed to be updated.

If you configure sync with a delay to low, step 1 might miss data points to be updated.

Regarding your example:

When it runs the transform at 10, assuming it run it the last time at 9.

query the source between 9 and 10 and e.g. identify that a and b have been changed (but not c and d).
query the source till 10 filtered by a and b and update the documents for a and b.

Note:

in step 2 it queries all data
a and b can be terms but also ranges if you are grouping by date_histogram
if the query runs at 10 it does not query "lower than 10" but "lower than 10 - delay", accordingly the range is step 1 is 9-delay <= x < 10-delay

So again both sync and frequency do not affect how data is transformed but only how and when it is updated. The transformation is defined in your group_by. If you are not getting the expected it results, you might have a problem with setting up sync or a problem in your data.

I think I can better help you if you post your config and explain what you expect.

anjilinga · July 6, 2020, 5:26pm

Thanks hendrick. i got the idea now

system · August 3, 2020, 5:26pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Is there a soon update for transforms API Elasticsearch transforms	6	510	November 4, 2021
Continuous Transform Timestamp isn't a timestamp Elasticsearch	4	1335	July 27, 2020
How to force Transform to run periodically? Elasticsearch	7	891	May 5, 2020
Transform missing data Elasticsearch transforms	3	1138	August 11, 2022
Impact of frequency value for continuous transform Elasticsearch transforms	3	135	July 11, 2024

Can some one explain how the transform runs it in this scenario

Related topics