I'm doing a transformation of time-series/events data that does not change for the past.
I've talked to an Elastic engineer Ben on Slack and it seems like I might have stumbled upon a bug.
I'm using this index config and this transform config.
The problem is that each iteration seems to be processing all the data. I've been running the transform on our indices (stats-[date]
) and here are the stats:
- 204B documents processed
- 14 min total indexing time
- 108 hours total search time
- 25 seconds total processing time
- "average" documents processed: 221M
- "average" checkpoint duration: 7 min
We use curator to delete old indices and so we always have 6 indices, each with under 50M docs, if I sum them up, it currently comes at 237M, right around the average of each run. So it looks like it processes all documents each time.
Ben says that there have been some improvements since 7.7.0, but I'm on 7.9.1.
Here's my CPU utilization and load across this week of turning it on.
Since then, I have modified my transform to do a filter query on the timestamp
field, here's the new transform with a range
query in the source
. This time, however, it seems like it's running fine. Is this expected behaviour?