In order to let transform update the destination index as the new data arrives it must run in continuous mode. Otherwise it runs as batch, which is a one off operation.
To use continuous you need to specify sync
in you config, this can be done in the UI as well as API:
{
"source": {...},
"dest": {...},
"pivot":{...},
"sync": {
"time": {
"field": "a_timestamp_field",
"delay": "" # <- optional, default '60s'
}
}
The configured timestamp field must meet requirements: It must be modeled after the real clock and it must not be in the past. Using delay
you can adjust that, with a delay
of 60s
transform will not read data until 60s have passed. That means within 60s data can arrive late and/or in any order.
I am not sure your timestamp
field meets those requirements. Your example contains the same timestamp for both documents. Does that mean timestamp
is the order date?
1. timestamp is the order date
If so, timestamp
is not suitable for continuous operation, you could set delay
to the maximum time a order can be cancelled, but that way transform would wait days until it does something.
The solution for this is to add an ingest timestamp. This timestamp can be used for sync
, but for group_by
you can still use timestamp
.
2. timestamp is already the ingest timestamp
If timestamp
is already an ingest timestamp or at least the timestamp of the operation set by your application, you only need to ensure delay
is configured properly. Ensure documents arrive to elasticsearch within the budget you set in delay
, e.g. 60s
.
However, now you have a problem with your group_by
. You bin the timestamp into daily buckets, but if timestamp
isn't the order date, the values can be spread over several buckets. What you need is group_by(min(timestamp))
, this is not possible with a single transform. For this case you have to group_by
transaction id (and maybe your other group_by's
) and add an aggregation with order_date = min(timestamp)
to it. On this output you can either use another transform or just straight create your visualization, which probably works fine as your 1st transform already reduced the amount of data significantly.
I hope that helps, if not, please share some more information: the configuration you are using and a description of the fields, e.g. the nature of the timestamp
field.
P.S.
I guess this contains a mistake, you are also grouping by id
, right?