I'am Idiot I have modify the time range to :
"range": {
"processed_at": {
"gte": "now-2h/h"
}
}
So no modification needed on code, it's now really fast. The good strategy is for time base aggregation :
-
At first create batches transforms for olds datas
-
At end, modify transforms to :
- Add sync parameters for set transforms as continuous
- Add a time range on queries to avoid process all datas on each iteration with a reasonnable range of time
And I suggest to elastic team the possibility of use "old checkpoint date" on query like that :
"range": {
"processed_at": {
"gte": "{{checkpoint}}-2h/h"
}
}
This will be more efficient (can set more aggressive time range) and more secure (if time is based on checkpoint and not on current date we avoid possibility of datas loose if transforms are throttle).