Getting latest data per user_id in time series data without latest transforms?

I have a users index. Users have various status "New", "Waiting", "Completed". A status can go from "Completed" to "New" again. So in time series it would look something like

user_id, status, timestamp
1 NEW 2023-06-30 5:00
1 WAITING 2023-06-30 6:00
1 COMPLETED 2023-06-30 7:00
1 NEW 2023-06-30 8:00

2 NEW 2023-06-30 5:00
2 WAITING 2023-06-30 6:00
2 COMPLETED 2023-06-30 7:00

I have millions of users. I would like to get number of users who are correctly in NEW state. The result should be 1 but with pivot aggregations I get count = 2 because userId:1 was COMPLETED at some point but then back to "NEW".

Latest transforms are too expensive usage wise, so we want to run these queries on demand. Is collapse or pipeline aggregations possible? Could someone help me here.

Can you ask your question in a different way? You mentioned you have an existing index, do you also have time series data? And are you saying the user index is being updated so that the status periodically movies from "Completed" to "New" value?

Yes I have a users_time_series index. The user status goes from NEW to WAITING to COMPLETED. In some cases it can go to NEW > WAITING > COMPLETED > NEW

Since its time series, we just append to the index. I want to get the count of users who are currently in COMPLETED status. I can do this via latest transforms but I was told, the usage isn't that high so we must do it without transforms.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.