Transform data mismatch with source index

rokcarl · July 14, 2022, 2:11pm

Hi,

I'm using transforms to aggregate our access logs to a daily aggregate, e.g. "customer A had 500 requests totalling 1000 credits".

Problem
The problem that I now have is that I want to verify that the transformed data adds up, i.e. is the count of requests the same? So I do the request on both indices, the results are really close, but not the same:

My intuition
The differences (last column) are so low that I suspect that there's some timing issue, but neither timezones nor transformation lag make sense to me.

Sources & code
Here's my transform code, the queries for the normal index and for the transform index, basically identical except for the normal one needed an additional aggregation to sum the requests which the other already has through the transform and change of the field name.

Any idea what might be going on?

Hendrik_Muhs · July 14, 2022, 3:26pm

As part of your queries I see you define a timezone. But you don't define that timezone in the transform. There you pre-aggregate with a date_histogram, so bucketing happens there. That means date bucketing is already done in the transformed index. I think you should define the timezone in the transform or configure the date_histogram with more granularity, e.g. 1h. That way your query on the transformed index can adjust the buckets. Now you basically already lost the precision after the transform.

Another reason for the mismatch might be the terms grouping. Can you verify that the customer field is never null? Transform by default ignores it otherwise or you set missing_bucket to true.

rokcarl · July 15, 2022, 7:07pm

You were spot-on. I set the granularity to 1h and filtered to where customer is set.
Thank you.

system · August 12, 2022, 7:07pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Transform missing data Elasticsearch transforms	3	1170	August 11, 2022
Transforms: do I need to filter source for time-series data? Elasticsearch transforms	10	1164	July 16, 2021
[HELP] Transform not accounting for documents with same timestamp Elasticsearch	2	90	June 21, 2024
Transform invalid date values Elasticsearch transforms	2	577	October 19, 2020
Transforms updates fields from data that exists from before even a filter condition was not met Elasticsearch transforms	4	473	December 23, 2020

Transform data mismatch with source index

Related topics