Transform data mismatch with source index


I'm using transforms to aggregate our access logs to a daily aggregate, e.g. "customer A had 500 requests totalling 1000 credits".

The problem that I now have is that I want to verify that the transformed data adds up, i.e. is the count of requests the same? So I do the request on both indices, the results are really close, but not the same:

My intuition
The differences (last column) are so low that I suspect that there's some timing issue, but neither timezones nor transformation lag make sense to me.

Sources & code
Here's my transform code, the queries for the normal index and for the transform index, basically identical except for the normal one needed an additional aggregation to sum the requests which the other already has through the transform and change of the field name.

Any idea what might be going on?

As part of your queries I see you define a timezone. But you don't define that timezone in the transform. There you pre-aggregate with a date_histogram, so bucketing happens there. That means date bucketing is already done in the transformed index. I think you should define the timezone in the transform or configure the date_histogram with more granularity, e.g. 1h. That way your query on the transformed index can adjust the buckets. Now you basically already lost the precision after the transform.

Another reason for the mismatch might be the terms grouping. Can you verify that the customer field is never null? Transform by default ignores it otherwise or you set missing_bucket to true.

You were spot-on. I set the granularity to 1h and filtered to where customer is set.
Thank you.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.